We often use
NULL values in date ranges, usually in the end date to signify the indefinite future. I have decided that this is a bad idea; here are my reasons.
[Update 2014-07-04] There have been some great replies to this post. I am going to address some of their points.
1) NULL should not mean anything
According to the SQL standard,
NULL means “unknown”. That is why you can’t really compare a
NULL value to anything, you can only say if it
IS NULL or not.
If you use
NULL in date ranges, it doesn’t just mean something, it means two different things! In the start date it means “from the beginning of time”, and in the end date it means “‘until the end of time”. This in not only contrary to the “official” meaning of NULL, it is confusing.
[Update] Kevan Gelling points out that there may really be “unknown” values, in which case NULL should be allowed. I agree, as long as NULL is reserved for that use.
Jeffrey Kemp says: ‘In most cases an End Date of NULL means “we don’t know (yet) what the end date will be – or even if it will ever end”.’ Jeffrey, you should add “we don’t even know if it has already ended”! Your phrase taken alone implies that the end date is in the future, not in the past. This makes NULL meaningful.
2) We can use real date limits instead of NULL
One argument for using
NULL is that it means the earliest possible start date, or the latest possible end date. We don’t need
NULL for that! In Oracle, the earliest possible date is 4712-01-01 B.C., or
DATE '-4712-01-01'. The latest possible date is
DATE '9999-12-31' (you can add the time element 23:59:59 if you want to be a purist.)
To enforce this, I suggest declaring the start and end date columns as
NOT NULL with default values. In Database 12c, I would use the
DEFAULT ON NULL clause: this clause puts the default value in the column even if you explicitly try to put a
If you want the output to show
NULL instead of these default values, you do have to use something like
NULLIF() in your
[Update] Again, Kevan Gelling argues that NULL may be necessary to indicate a value that is not known. In that case my suggestion is no good, but you still have to use something other than NULL to indicate “beginning of time” and “end of time”.
Several posters bring up the fact that artificial extreme dates will “skew” the data and make it hard for the optimizer to choose the best plan. They are right, I should mention this. However, the data is skewed whether I use extreme dates or NULL. Supposing I do use NULL to mean “the end of time”, many queries will have to use
END_DATE IS NULL in the
WHERE clause, so the skew is there no matter what.
3) Oracle doesn’t index NULLs
When you query data with date ranges, you often have to check “greater than” one value and “less than” another value. This may require separate indexes on start date and end date. If you have
NULLs in your date ranges, those indexes will not be used since Oracle doesn’t create index entries when all the indexed columns are
If you use real values and
NOT NULL, your indexes will always work.
[Update] Some readers apparently didn’t see the phrase that I have now underlined. The header was probably misleading: Oracle will indeed index a row if any of the indexed columns is not
Some objected that indexes on dates alone are rarely used: date ranges are almost always applied to some object, so the index will include the object and one or both dates. In that case, the rows with
NULL dates will be indexed. I agree. In that situation, NULLs don’t cause a problem, and “skew” won’t either, as long as the object comes before the date in the index column list.
4) Queries on NULLable date ranges are hard
I have blogged about various date range problems: finding gaps, finding overlaps and merging contiguous ranges. Almost always, my solutions worked fine without
NULLs and broke when I introduced
NULLs required either more complex queries or substituting real values for
NULL. Why go to all that work when we can just use the real values to begin with?
[Update] Even if we need NULL values sometimes to mean “unknown value”, we still don’t use
IS NULL in our queries, so indexes should work when needed.
5) Unique constraints don’t work with NULL
Most queries on date ranges, including mine, assume that start dates are unique. In some cases, it may also be useful to have unique end dates. The only way to make sure they are unique is to add a unique constraint. Unfortunately, uniqueness is only enforced for
NOT NULL values.
Most of the time, your constraint will be on two columns: some object and a date. In this case the unique constraint will work as long as the object column is
NOT NULL. However, anytime you need unique dates throughout a table you must define them as
NOT NULL and use default values.
[Update] As mentioned under paragraph 3), this point is only valid for unique constraints on the date alone. This is probably a rare case. Also, as Kevan reminded me, you could always use a function-based index on (DATE_COL, ‘X’) to make sure every row was indexed.
[Update] Conclusion: NULL should mean “unknown” only
As readers have pointed out, my arguments 3) and 5) are weak, since they only apply to indexes or constraints on a date column alone, and those are rarely needed.
I’ll stand by arguments 1), 2) and 4), but thanks to Kevan I’ll add one thing: just because I don’t use
NULL to mean “beginning or end of time”, that doesn’t mean I might not need it to mean “unknown value”. In that case, default values may not be the way to go.
Finally, please read the replies: good stuff, and thanks to all!