Skip to main content

Partitioned Tables

With a partitioned table, you can divide data stored in a single table across multiple partitions, possibly stored in different databases, based on a logical scheme or on ranges of RowIDs. Any DML commands (SELECT, INSERT, UPDATE, and DELETE) work the same on partitioned tables as they do on any other SQL table, with only minor exceptions. As a result, partitioned tables can:

Create a Partitioned Table

Use CREATE TABLE with the PARTITION option to create a partitioned table.

There are two main types of partitioned tables: default-partitioned tables and key-partitioned tables. In general, the term “partitioned table” is a catch-all term to refer to a default-partitioned table or a key-partitioned table, and these terms are used only when the distinctions between them make a difference. InterSystems recommends defining a key-partitioned table rather than a default-partitioned table in most circumstances.

A default-partitioned table merely splits table data into physical buckets (which transparently increase query efficiency for very large tables) according to a row’s RowID. A key-partitioned table maps a logical scheme over these physical buckets to create partitions as specified by a partition key, which configures how rows are distributed among the partitions based on a partition key field and a partition key type.

Your choice of partition key field and type depends on your dominant query patterns (such as querying based on a range of values for a DateCreated column) and the relative business value different partitions have (for example, data added over a year ago may be of lower value than more recent data). There are three distinct partition key types: RANGE, LIST, and HASH. You may define a multi-field partition key to define multiple, nested levels of partitions.

Specifying a field as part of a partition key causes the field to be ReadOnly. Attempting to update a field that is part of the partition key returns an error.

A partitioned table must have system-assigned RowIDs.

Note:

Currently, sharded tables and tables that use columnar storage, VECTOR-typed fields, computed fields, or functional indexes cannot be partitioned. These limitations will be lifted in a future release. See Known Issues and Notes for more information.

Partition Key Types

Partition By Range

When partitioning data by range, specify the size of the interval that should separate partitions. All the partition key field values within a single partition will be within the same specific interval (for example, the same year or set of three months). The partition key field used for this partition key type must be a date type (for example, DATE or TIMESTAMP) or an exact numeric type (for example, INTEGER or DOUBLE). If the field is of a date type, you must include a unit of time (such as YEARS or MONTHS) when defining the interval size.

For example, to create a partition key on the TransactionDate field in intervals of three months, define a partitioned table as follows:

CREATE TABLE Store.Transactions (
    CustomerName VARCHAR(50),
    TransactionDate DATE,
    StoreLocation VARCHAR(25)
) PARTITION BY RANGE (TransactionDate) INTERVAL 3 MONTHS

The resulting table partitions any rows added into the table based on which three-month interval it belongs to.

As a second example, to create a partition key on the CustomerID field in intervals of 100, define a partitioned table as follows:

CREATE TABLE Store.Orders (  
   OrderID INTEGER,  
   CustomerID INTEGER,  
   OrderTotal NUMERIC(12,2)
) PARTITION BY RANGE (CustomerID) INTERVAL 100
Partition By List

When partitioning data by list, each unique value in the partition key field corresponds to a unique partition. All rows stored in a given partition have the same value for the partitioned field. The partition key field used for this partition key type must employ EXACT collation.

As the system creates a new partition for each distinct value, consider how many distinct values you expect to appear in the field. The query performance benefits of this partition key type degrade if used on a field containing many distinct values. Instead, in such situations, consider using the HASH partition key type.

To create a partition key for the unique values in the StoreLocation field, define a partitioned table as follows:

CREATE TABLE Store.Transactions (
    CustomerName VARCHAR(50),
    TransactionDate DATE,
    StoreLocation VARCHAR(25) COLLATE %EXACT
) PARTITION BY LIST (StoreLocation)

The above table partitions rows based on the value of the StoreLocation field.

Partition By Hash

When partitioning data by hash, define a fixed number of partitions. Rows added to the table are distributed equally across those partitions by dividing a generated hash value for the partition key field (n) by the number of partitions (m) and taking the remainder (equivalent to the operation n mod m). All rows with identical values for this modulo operation are stored in the same partition.

If the data in the field you are partitioning has many distinct values, then the data in your table is evenly distributed across the number of partitions you have defined. However, if there are only a few distinct values, you should consider using the LIST partition type.

The following example creates a partition key on the TransactionGUID field, defining a partitioned table with exactly four partitions and distributing its data evenly across the partitions:

CREATE TABLE Store.Transactions (
    TransactionGUID VARCHAR(128),
    TransactionDate DATE,
    StoreLocation VARCHAR(25)
) PARTITION BY HASH (TransactionGUID) IN 4 PARTITIONS

The above table partitions rows based on a generated hash of the value in the TransactionGUID field.

Multi-Field Partition Keys

When defining a partitioned table, you may choose to define a partition key that includes multiple fields to finely tune how your data is distributed across your partitions. To do this, simply add another partition key type clause to the end of your CREATE TABLE statement, separated by a comma. You can add any number of fields to a partition key, but most key-partitioned tables perform adequately with one or two, as is considered useful based on expected query patterns.

The following table defines a two-field partition key on the Store.Transactions table. The table is first partitioned according to the value in the StoreLocation field; this is the outermost partition level. Then, each StoreLocation partition is further divided into sub-partitions in intervals of three months according to the value in the TransactionDate field; this is the inner partition level. The resulting partitions contain transactions from a single store location during a certain three-month period.

CREATE TABLE Store.Transactions (
    CustomerName VARCHAR(50),
    TransactionDate DATE,
    StoreLocation VARCHAR(25) COLLATE %EXACT
) PARTITION BY LIST (StoreLocation), RANGE (TransactionDate) INTERVAL 3 MONTHS

Improved Query Performance Through Partition Pruning

One of the benefits of partitioned tables is their ability to improve query performance through partition pruning, which is employed on queries that use a WHERE clause condition on a partition key field. Each partition in a table is identified with a partition ID, which is derived from a combination of the partition key field and partition key type. The query optimizer looks at the WHERE clause of a query, identifies whether it involves a partition key field, and, where possible, uses the constraints of the WHERE clause condition to prune scans of irrelevant partition IDs from the query plan. As a result, partition pruning can dramatically shrink the amount of data that the query needs to look at in order to return the proper rows and greatly improve query performance.

Partition pruning is employed as an automatic, standard optimization. Before defining a partition key, consider how frequently you expect to employ WHERE clauses on certain fields. If, for example, you anticipate frequently executing queries that would benefit from optimizations of range conditions, you may decide to define a RANGE partition key on a field frequently referenced in a WHERE clause. These planning decisions must be made before defining a partition key, as you cannot change a partition key once it is defined.

View Partition Metadata

Once you have created a partitioned table, information about your partitions, partition keys, and partition mappings is cataloged for you to view at any time through the Catalog Details in the Management Portal or the tables in the INFORMATION_SCHEMA schema. The INFORMATION_SCHEMA.TABLES table contains a column, IS_PARTITIONED, that you can query to find out which tables are or are not partitioned.

  • INFORMATION_SCHEMA.TABLE_PARTITIONS: Contains information about each partition in the current namespace, including the partition ID, the schema and table that the partition belongs to, the number of rows and buckets stored, the estimated size of the data, and the database location to which the partition is mapped. This table does not catalog any information for default-partitioned tables, as they only divide data into buckets, not key-based partitions; query INFORMATION_SCHEMA.TABLE_PARTITION_BUCKETS instead.

  • INFORMATION_SCHEMA.TABLE_PARTITION_BUCKETS: Contains information about each bucket in the current namespace, including the number of rows, the estimated size of the data stored in each bucket, and the database location to which the partition is mapped. Buckets are used for both default-partitioned and key-partitioned tables, so INFORMATION_SCHEMA.TABLE_PARTITION_BUCKETS contains information about both default-partitioned and key-partitioned tables.

  • INFORMATION_SCHEMA.TABLE_PARTITION_KEY: Contains information about specific partition keys, including the table and schema the partition key applies to, the partition key field, and the partition key type. There is one entry in this table per partition key field; thus, if a table has a multi-field partition key, there is one row per each field in the partition key. This table includes a PARTITION_LEVEL column that indicates the order of nested partitions, with level 1 representing the outermost partition level.

  • INFORMATION_SCHEMA.TABLE_PARTITION_MAPPINGS: Contains information about each partition mapping in the namespace, including the rule that defines the mapping, the table and schema, the number of rows and buckets affected by the mapping, the database location that the partition is mapped to, and the estimated size of the data. Each entry in the table corresponds with a partition that is mapped to a database that is different from the namespace’s default database.

Move Partitions Between Databases

One of the crucial advantages of a partitioned table is the ability to move partitions to different databases, where they can still be queried as normal. Some benefits of this approach include separating a very large table running up against database size limits into smaller pieces stored in different databases or storing infrequently queried data on a cheaper tier of storage.

The ALTER TABLE command with the MOVE PARTITION option supports moving a partition or range of partitions to a specified location, creating mappings to the new location. This command can be used either before or after the table has been populated with data. Any data already in the specified partition or range of partitions is moved to the newly specified location. Any data subsequently added to the table is stored according to the locations specified in previous ALTER TABLE <tablename> MOVE PARTITION commands. The system transparently uses the low-level DataMove utility to move stored data to the new location defined by the mapping. No direct use of DataMove by the user is required to move partitions, but familiarity with the concepts may help with interpreting log and error messages.

Note:

Moving partitions between databases with DataMove may result in significant additional journaling.

Prerequisites for Moving Partitions
  1. Plan out the databases you wish to use. You can use preexisting databases to store partitions, and any database used for table partitioning can be used for other purposes as well. However, depending on performance and space issues for your application or configuration, you may prefer to define new databases. You may define such additional databases with the CREATE DATABASE FILE command, through the Management Portal, or with the ^DATABASE utility.

  2. Configure access to these databases by granting appropriate users access to the corresponding database resources.

  3. Make sure that “Journal freeze on error” is enabled. From the Management Portal, navigate to System Administration > Configuration > System Configuration > Journal Settings. Then, verify that the “Freeze on error” checkbox is selected.

The partition or range of partitions to be moved can be specified by partition ID or by range of partition key field values or partition IDs.

Move Partitions by Range of Partition Key Field Values

The easiest way to move partitions is by specifying values for the partition key fields, allowing the system to derive the corresponding partitions. Identify the range of values that you would like to move to a different database by using ALTER TABLE with the MOVE PARTITION BETWEEN option. The range you provide must be of the type of the partition key field and is inclusive of both the starting and the ending value. You may not specify a range of values for a table partitioned with a HASH-typed partition key.

For example, on a table that is key-partitioned on a date field, the following command identifies all partitions from the year 2023 and moves them (along with their data) to a database named data-2023.

ALTER TABLE demo.log MOVE PARTITION BETWEEN '2023-01-01' AND '2023-12-31' TO data-2023

A range moves all partitions that it encompasses. Additionally, partitions are always moved in full, so it is not necessary to identify the exact endpoints of your range. If your data is partitioned, for example, in intervals of one year, the ranges BETWEEN '2025-01-01' AND '2025-12-31' (which identifies the entire calendar year) and BETWEEN '2025-01-01' AND '2025-01-02' (which only identifies a span of two days within the same year) both move all the data for the year 2025 to the specified database. On the same table, the range BETWEEN '2025-01-01' AND '2027-01-01' moves the partitions for the years 2025, 2026, and 2027 to the specified database.

Move Partitions by a Partition ID or a Range of Partition IDs

Partitions are assigned a partition ID implicitly, based on their partition key, which can be used as an alternative means of moving individual partitions. There are multiple ways to determine the partition IDs you would like to move:

  • On tables with a LIST partition key type, the partition key field value is identical to the partition ID, so you can specify which partition to move by using the field value.

  • Issue a query on the x__partitionid1 column with a WHERE clause that identifies the range of partition key field values to be moved.

    Note:

    On a table with a multi-field partition key, x__partitionid1 indicates the outermost partition ID. The inner partition ID fields are stored in x__partitionid2, x__partitionid3, and so on, with each higher number indicating another inner layer of partitioning.

    For example, the following query would return the distinct partition IDs of rows that have a TransactionDate in the years 2023, 2024, or 2025:

    SELECT DISTINCT(x__partitionid1) from Demo.Inventory 
      WHERE TransactionDate >= TO_DATE('01 JAN 2023') 
      AND TransactionDate <= TO_DATE('01 JAN 2026')
  • View your partition schema by querying INFORMATION_SCHEMA.TABLE_PARTITIONS or look in the Catalog Details and identify possible partition IDs.

Once you have identified the partition ID you would like to move, use ALTER TABLE with the MOVE PARTITION ID option to move a partition ID or a range of partitions IDs to the specified database. For example, the following command moves the partition with ID 202411010000 to the database named data-2024.

ALTER TABLE Demo.Inventory MOVE PARTITION ID 202411010000 TO data-2024

As a second example, the following command, for a table with a LIST partition key, moves the partition with the value 'alabama' in its partition key field to the us-south database.

ALTER TABLE Demo.Inventory MOVE PARTITION ID 'alabama' TO us-south

You may also specify a range of partition IDs to move. Each partition within the range, including the two values specified, is moved to the specified database. For example, the following command moves all partitions with a partition ID between 202411010000 and 202506010000 to a database named prerelease.

ALTER TABLE Demo.Inventory MOVE PARTITION ID BETWEEN 202411010000 AND 202506010000 TO prerelease

On both default-partitioned and key-partitioned tables, you can move specific buckets between different databases by identifying the bucket ID, which is an integer. To do so, use <partition id>||<bucket id>. For example, the following command moves the bucket with bucket ID 3 within the partition with partition ID 202411010000 to the prerelease database.

ALTER TABLE Demo.Inventory MOVE PARTITION ID '202411010000||3'  TO prerelease

You may also move a range of bucket IDs from a single partition to a new database. For example, the following example moves buckets with a bucket ID between 3 and 5 from the partition with partition ID 202610140000 to the test-db database.

ALTER TABLE Demo.Inventory MOVE PARTITION ID 202610140000 BETWEEN 3 AND 5 TO test-db
Move Inner Partitions on a Table with a Multi-Field Partition Key

You can move the inner partitions of a table with a multi-field partition key to a different database without moving the outer partitions. To do so, identify the partition ID for the inner partition you would like to move, which comes in the form <outer partition ID>||<inner partition ID>. The following query returns the correctly formatted multi-field partition keys for each row in the Demo.Inventory table.

SELECT x__partitionid1 || '||' || x__partitionid2 FROM Demo.Inventory

On a table that has an outer partition of type LIST on a string-typed field and an inner partition of type RANGE on a date-typed field, the following command moves the inner partition with partition ID 'Massachusetts||202601010000' to the test-db database.

ALTER TABLE Demo.Inventory MOVE PARTITION ID 'Massachusetts||202601010000' TO test-db

You may also move a range of inner partition IDs to a different database. To do so, specify the outer partition ID and then use BETWEEN to identify the range of inner partition IDs you would like to move. For example, on a table that has an outer partition of type RANGE on a date-typed field and an inner partition of type HASH, the following command moves the inner partitions with partition IDs between 5 and 8 to the test-db database.

ALTER TABLE Demo.Inventory MOVE PARTITION ID 202504030000 BETWEEN 5 AND 8 TO test-db

Partitioned Table Conversions

You may convert a table between a partitioned and nonpartitioned format using DDL. The ALTER TABLE command supports options for turning partitioning on or off and for defining partition keys. You may continue querying the table and using INSERT, UPDATE, and DELETE commands to change its data while the table conversion command is running.

A table cannot be converted between a partitioned and nonpartitioned state if it is defined with USEEXTENTSET=0.

Important:

When you convert a nonpartitioned or default-partitioned table to a key-partitioned table, the conversion process changes the RowIDs in your table. To enable foreign keys and references from other tables to continue working, the system creates a surrogate ID field (to preserve the old RowID values) and a surrogate ID index (to translate the old RowIDs to the new IDs). This can cause query and update performance to be slower than if the same table was originally created with the same partition key.

Convert a Nonpartitioned Table to a Default-Partitioned Table

To convert a nonpartitioned table to a default-partitioned table, use ALTER TABLE with the CONVERT PARTITION ON option. Turning partitioning on for a table does not define a partition key, but rather stores the rows of the table in buckets based on ranges of RowIDs. The example below demonstrates the use of this option.

ALTER TABLE Store.Transactions CONVERT PARTITION ON

You can subsequently use ALTER TABLE with the CONVERT PARTITION BY option to define a partition key; however, InterSystems strongly recommends performing such a conversion in a single step.

Note:

A nonpartitioned table cannot be converted to a partitioned table if it is sharded, uses column-based storage, does not have a system assigned IDKey, or has VECTOR-typed fields, computed fields, or functional indexes.

Convert a Nonpartitioned or Default-Partitioned Table to a Key-Partitioned Table

To convert a nonpartitioned or default-partitioned table to a key-partitioned table, use ALTER TABLE with the CONVERT PARTITION BY option. However, if you have already defined a partition key on a table, you cannot change the partition key, convert it to a multi-field partition key, or convert the table to a default-partitioned table.

ALTER TABLE Store.Transactions CONVERT 
    PARTITION BY RANGE (TransactionDate) INTERVAL 3 MONTHS
Convert a Partitioned Table to a Nonpartitioned Table

To convert a partitioned table to a nonpartitioned table, use ALTER TABLE with the CONVERT PARTITION OFF option to turn off the partitioning mechanism. Performing such a conversion moves the data back to the default globals database for the namespace, removes all storage mappings, and removes the partition key. The example below demonstrates the use of this option.

ALTER TABLE Store.Transactions CONVERT PARTITION OFF

Surrogate IDs

To maintain RowID consistency and support partitioned table conversions, converting to or from a key-partitioned table results in the creation of a surrogate ID field and a surrogate ID index.

When a table is partitioned with a partition key, the RowIDs in the table are changed from their initial form (a default, system-assigned integer) to a string identifying the partition containing the row and a unique identifier within the partition. To ensure that previously defined foreign keys or other references to this table from other tables still work, the system stores the preconversion RowID value as a surrogate ID, and creates a surrogate ID index that maps the surrogate ID values to the new RowIDs, ensuring referential integrity among all entities in your schema.

There is a performance cost associated with querying and maintaining the surrogate ID field and surrogate ID index. Queries that explicitly refer to the RowID must perform a conversion between the surrogate ID and the current RowID before returning results, and so perform more slowly after conversion. Inserting data into a table with a surrogate ID index also performs slower after conversion, because of the need to update the index. If you are certain that there are no foreign keys or references from other tables referring to a table with a surrogate ID and surrogate ID index, you can drop the surrogate ID and its index after the conversion is completed to avoid the performance costs.

Note:

A nonpartitioned table that has a surrogate ID (which only occurs when a key-partitioned table is converted to a nonpartitioned table) cannot be converted to a key-partitioned table unless you first drop the surrogate ID.

Limitations of Partitioned Table Conversions

Due to the structure of surrogate IDs, some conversions between partitioned and nonpartitioned tables are not possible. These restrictions are:

  • You cannot convert a nonpartitioned table to a key-partitioned table if the table has an auto-increment column.

  • You cannot convert a key-partitioned table to a default-partitioned table.

  • You cannot convert a nonpartitioned table to a key-partitioned table if the table already has a surrogate ID field and surrogate ID index. This path only arises as a reconversion, where the nonpartitioned table was originally a key-partitioned table. To perform this reconversion, first verify that there are no foreign keys or references to the old RowIDs from other tables that must be preserved. Then, drop the surrogate ID field and surrogate ID index and you will be able to perform the conversion.

    Important:

    If there are foreign keys or references from other tables referring to this table, they will no longer work after dropping the surrogate ID field and surrogate ID index.

Drop or Truncate a Partition

You may drop or truncate a single partition from a partitioned table using ALTER TABLE, either by individual ID or by range of either IDs or partition key value field values.

Dropping a partition deletes the data in the partition and deletes any mappings.

Truncating a partition deletes the data in the partition, keeping intact any mappings to a non-default database you created when moving the partition so that any data subsequently inserted into the partition is stored in the same non-default database.

Drop or Truncate Partition by Range of Partition Key Field Values
ALTER TABLE Demo.Inventory DROP PARTITION BETWEEN '2023-01-01' AND '2023-12-31'
ALTER TABLE Demo.Inventory TRUNCATE PARTITION BETWEEN '2023-01-01' AND '2023-12-31'
Drop or Truncate Partition by Individual ID

For a standard key-partitioned table:

ALTER TABLE Demo.Inventory DROP PARTITION ID 202301010000
ALTER TABLE Demo.Inventory TRUNCATE PARTITION ID 202301010000

For a multi-field key-partitioned table:

ALTER TABLE Demo.Inventory DROP PARTITION ID '202301010000||4'
ALTER TABLE Demo.Inventory TRUNCATE PARTITION ID '202301010000||4'
Drop or Truncate Partition by Range of Partition IDs
ALTER TABLE Demo.Inventory DROP PARTITION ID BETWEEN '202301010000' AND '202312310000'
ALTER TABLE Demo.Inventory TRUNCATE PARTITION ID BETWEEN '202301010000' AND '202312310000'

To drop an entire partitioned table, use the DROP TABLE command. Dropping a partitioned table deletes the data in the partition and any mappings.

To truncate an entire partitioned table, use the TRUNCATE TABLE command. Truncating a partitioned table deletes the data in the partition, but keeps all mappings intact.

FeedbackOpens in a new tab