Sunday, June 3, 2012

Manifesto of Time Intervals

Introduction

This is a note about the handling of time intervals within IT systems. The note contains observations of a general nature only.

Within multiple customer projects, I have experienced the specific effects at all levels of detail and for all kinds of reason. I have seen the handling of time intervals go wrong and lead to most obscure, complex code which can be corrected by the highest skilled programmers only.

General

In a typical IT system, there are many data that are valid within a given time only. This applies especially to systems of an administrative nature, which are user-facing screens and persisted data of many kinds. Validity of data is reflected by dates, time stamps and time intervals. As an expression of periods of validity and durations are intervals especially interesting.

Across development in general and across people with different focus, there is broad consensus that a time interval is expressed by the indication of two dates or two time stamps and that these time intervals is something that you can handle right away and in a good way. The mathematical interpretation is that a time interval is simply a contiguous segment of time in the one-dimensional space. It should be simple to handle.

When you want to handle it, there are several issues that require consideration and must be handled seriously: Granularity, homogeneity, representation. The following is a short presentation of these issues. Even though the presentation is somewhat abstract, the effects are shown very specifically.

Granularity

The subject of granularity is found in several variants and for a number of reasons. Granularity is emerging as an issue because some people focus on an application's presentation and quickly conclude that the validity from a working standpoint is obvious throughout the day. Here one thinks like that when one thing is valid until January the 10th and the second thing applies with effect from January the 11th, then these are an expression of disjoint intervals:


[…; January 10th 2011] , [January 11th 2011; …]

There is no doubt as to which interval a day belongs to.

Some people like an IT architect focus on the domain logic of an application and they think that the user-facing screens of the application tend to change over time. What initially could be formulated as valid in terms of whole days may change to be formulated in parts of the day and counted in hours, minutes and seconds. While durations of time initially appears to be best presented in terms of hole days, it is best to avoid tying this level of granularity into the domain logic. It is more tractable not having to modify the domain logic at all, rather than having a certain granularity bonded into the application in both the presentation and in the domain logic. With a focus on domain logic, it is natural to choose a finer granularity, which includes not only whole days, but seconds and perhaps even milliseconds. The previous time intervals in the presentation is thus to be presented in a different way and including a finer granularity:


[...; 2011-01-10 23:59.59.999], [2011-01-11 00:00:00.000; ...]

Just as the domain logic is an issue in relation to the presentation, the domain logic is also an issue in relation to the persisted data. Persisted data may have a coarser granularity like the presentation of the data, or the persisted data may have varying granularity where some parts are persisted at a coarse granularity in whole days and where some parts are persisted at a finer granularity including seconds. When any given set of persisted data is inspected from the perspective of the values alone, it can be quite difficult to determine if the data represent closed intervals like –


[...; January 10th 2011]

– or if the data represent open intervals like –


[...; January 10th 2011[

– which easily makes a difference since there is a day in between. Only when making a more active role and examining how consecutive, disjoint intervals are represented, it becomes possible to determine if the end-point in one interval is different from the starting point in the consequtive interval –


[...; January 10th 2011] , [January 11th 2011; ...]

– or if the two points are equal, which opens up for choosing between either the interpretation –


[...; January 10th 2011[ , [January 10th 2011; ...[

– or the interpretation –


]...; January 10th 2011] , ]January 10th 2011; ...]

– which in turn is a day of a difference. For data with a coarser granularity like whole days it becomes more important to handle the endpoints correctly as either open or closed and at the correct endpoints.

Homogeneity

A handling of data is found in the domain logic which is serving a purpose between the presentation and the persistence. It is therefore natural to focus on ensuring that there is one common way to express time intervals. If it is possible to avoid having functionality to compare both open and closed intervals, if it is possible to avoid having functionality to compare time intervals at different levels af granularity like whole days and seconds, and if it is possible to avoid comparing intervals across both, then domain logic becomes tractable. It is preferable to have a uniform representation.

As part of a presentation, time intervals are selected and displayed, but never manipulated or modified. It is possible that some trickery is required when selecting endpoints. If a user selects dates as an interval like –


[January 15th 2011; January 25th 2011]

– then it may happen that the communication between the presentation and the domain layer should be at a different level of granularity such as –


[2011-01-15 00:00:00; 2011-01-25 23:59:59]

– or as –


[2011-01-15 00:00:00; 2011-01-26 00:00:00[

– and depending upon how the domain logic is constructed. As part of a presentation, it is possible to handle the fact that an endpoint chosen as January 25th is send to the domain logic as January 26th, and that an endpoint received as January 26th is presented as January 25th. When a presentation is free from changing the meaning of time intervals as it is otherwise done within the domain logic, then it is possible to perform these conversions in a good and sound way.

Something similar is true regarding persistence. When the meaning of time intervals is kept unchanged, then it becomes possible to handle conversions between domain logic and persistence.

It is important that the domain logic has a single, homogenous handling of time intervals. This can be implemented directly by explicitly using only a single representation of time intervals. Or it can be implemented indirectly by means of using one of the more advanced libraries using various representations of time intervals and where conversions are used and performed without ever exposing these to the actual domain logic.

Representation

Most IT systems are administrative in nature, where time intervals are related to human time. The quick conclusion is to agree upon that when time intervals are represented, then it is sufficient to have endpoints expressed with an accuracy of seconds and perhaps even with millisecond accuracy [1].

This means that where endpoints are shown as part of a presentation and the endpoints are chosen throughout the day –


[January 15th 2011; January 25th 2011]

– it is most immediate to have the domain logic settle on a representation with a finer granularity as –


[2011-01-15 00:00:00.000; 2011-01-25 23:59:59.999]

– which has a precision including a resolution of milliseconds. This must be perfect, right?

Man's time after the clock and the sun is continuous. A digital machine time is discrete and is measured with the representation above in whole milliseconds. If man and the machine want to represent two time intervals which meet at the change between January the 25th and January the 26th, then it must be represented as –


[...; 2011-01-25 23:59:59.999], [2011-01-26 00:00:00.000; ...]

Here, the granularity in milliseconds is expressed by the fact that the value for the end-point in the first time interval is not strictly equal to the value for the starting point of the second time interval. To construct these intervals, it is required to handle the finer granularity explicitly by subtracting a millisecond. This handling is explicitly expressed in the domain logic of commonly occurring situations where the validity is measured at a much coarser granularity like whole days or whole hours.

What also tend to happen is that the time interval is saved in a database that happens to have a granularity of seconds. The interval like the above having a slightly finer granularity including milliseconds is persisted at a coarser granularity without the last fraction –


[...; 2011-01-25 23:59:59], [2011-01-26 00:00:00; ...]

– and then, later, when the interval is read again, then the interval without a cumbersome management becomes something like –


[...; 2011-01-25 23:59:59.000], [2011-01-26 00:00:00.000; ...]

The initial interval endpoint was 2011-01-25 23:59:59.999 but became 2011-01-25 23:59:59.000, which is not the same value.

So a presentation can count on throughout the day while the domain logic may have a homogeneous representation in milliseconds and the relationship between milliseconds and seconds are not handled properly. This is going to occur even if only a coarser resolution like whole days is used and without using hours or minutes at all. This is not good.

For both to get the continuous to meet with the discreet and to avoid being forced into lengthy conversions to grassland, and without the intervening granularities in hours and minutes are used, it is advantageous to use time intervals which are open at the end point. The time intervals are then represented as –


[...; 2011-01-26 00:00:00.000[, [2011-01-26 00:00:00.000; ...[

– and should the intervals happen to be persisted without milliseconds as –


[...; 2011-01-26 00:00:00[, [2011-01-26 00:00:00; ...[

– then they can be read again by simply resetting the not persisted fraction containing milliseconds to 000 –


[...; 2011-01-26 00:00:00.000[, [2011-01-26 00:00:00.000; ...[

– so that what was written is equal to what is being read. As long as the presentation only shows days, hours, minutes and seconds, there are no distinctions between the continuous and the discrete. The action required to make a new time interval that starts where the previous interval ended, becomes a very trivial assignment of endpoints.

Whether to use open or closed intervals is absolutely not trivial.

Conclusion

To summarise, the overall conclusion is this:


In general IT systems of an administrative nature, it is advantageous to represent the validity of data by time as intervals where the granularity is milliseconds and the interval is half open and as indicated by the form of this example:

[2011-01-01 00:00:00.000; 2011-02-01 00:00:00.000[

Regardless of the details that lead to this conclusion, there are signs that others have reached the same result. One can look at libraries, where many libraries represent times and dates in many guises, but only few have representations of time intervals. One of the better libraries is Joda Time [2]. When looking closely at this library, it is worth noticing that the definition of a time interval [3] is documented as follows:


...
A time interval represents a period of time between two instants. Intervals are inclusive of the start instant and exclusive of the end.
...
Intervals have a fixed millisecond duration.
...

How the authors of this very rich library have come to the conclusion, that time intervals should have only a single representation, one can guess.

Regarding the handling of time intervals, I have my own conclusions. Handling time intervals is not a trivial task, there is a most correct way to handle time intervals, and it is best to handle time intervals correctly within an implementation process from the start on.

Morten Sabroe Mortensen


[1] Alternative conclusions apply when the systems in question are live, real-time systems where scheduling is a vital topic. In this case, the timeline is not man's time, but a CPU's global beats. This beat can be read accurately for a CPU running at 3 Ghz, it is possible to use the operating system to access the global CPU counter and which are incremented 3*10^6 = 3,000,000 times per second. Here it is of relevance to measure time in microseconds and nanoseconds. This precision is far beyond the capabilities of a typical operating system and its interrupt-driven clock, because each update usually drifts in multiple milliseconds and depending upon internal schedulation. The premises are different and a conclusion about what is good enough, is another.

[2] See http://joda-time.sourceforge.net/.

[3] See http://joda-time.sourceforge.net/api-release/org/joda/time/ReadableInterval.html.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.