Concepts

"Make the IoT data simple!"

In JoinBase, we hope that users can use the database very well just by some simple intuitions and experiences. We only expose the minimal concepts that must be known by users in current technical conditions. Apart from this, you don't need any other data field knowledge to use JoinBase well.

SQL

In JoinBase, you are only required to understand the plain SQL. Considering the declarative nature of SQL, as design, you basically only need intuition to use JoinBase well.

Also based on this, for SQL, we do not plan to copy it all into JoinBase's language. We will make compatible with its most intuitive parts, discard the unintuitive parts, and expand the parts that it has not yet done, but should be done simpler and more intuitive as a great bigdata analysis tools.

Partition

For the IoT scenario, the data generated by devices can be regarded as endless. Therefore, putting all the data into one place is neither necessary nor practical.

A partition in JoinBase is just a part of data or records which is sliced by your specified partition expression on a column or a list of columns when creating the schema of the table.

Furthermore, a partition in JoinBase act as an unit of data skipping for reducing query's scanned dataset. By skipping un-interested partitions, you can make your query lighting fast even the total dataset in the database is unlimited. We have carefully built an engine that can resist most of the pathological partition shapes. But wrong partition schemas may still greatly affect query performance. Because the great real-world flexibility on the top of the JoinBase's general SQL model, we think it is better to expose the concept of partition to users currently.

Related to partition, the following concepts you should know to create a correct schema,

Partition Keys

Partitioning of a table should be based on the table's data. Commonly you use one or multiple columns in the table. These selected columns are the so-called partition keys.

Currently, we only support the single column as a partition key. And the value of partition must be a unsigned 64bit integer. In the future, we will support the multiple columns as a compound partition key.

Partition Expression

Sometimes, it is not convenient to just use the column’s value itself. Instead, it is better to use the derived value from the column.

For example, it is natural to partition the time series table by a kind of time unit, like a day. But you may have only a Timestamp type column in your table. There are two ways for solving this:

You add a new column like date as the partition key to represent the day partition granularity.
It will be easier if you just tell the databases how to calculate the partition key (in the day partition granularity) from the existing timestamp. This is the so-called partition expression.

For performance and security reasons, we support several common functions (mainly date-time related) to be used as partition expressions. See the language reference page for more detail.

Currently, you may combine two above ways if you want one much complex partition schema. We will continue to improve the partition expressions support.