Thursday, February 28, 2013

Java TimeUnit Conversion

Java TimeUnit Conversion

When Java 5 (J2SE 1.5/5.0) was released in 2004 it included Doug Lea's Java concurrency library. This library includes functionality for e.g. scheduling threads at particular instants in time. Some corners of this uses an enumeration object named java.util.concurrent.TimeUnit and which continues to be present in Java SE 6, 7 and 8.

The time unit object represents nanoseconds, microseconds, milliseconds, seconds, minutes, hours and days. In the context of the concurrency library, a time duration is represented as an integer of type long and refers to one of the time units. Conversion of time duration values can be done easily by using the method TimeUnit.convert(long,TimeUnit).

This method has an annoying property: It does not handle rounding. If a time duration of 800 microseconds is converted to milliseconds, the result becomes 0 milliseconds. This is not always preferable. In some cases it would be nice to have rounding applied and to end up with a result of 1 millisecond. Rounding is not offered by the time unit object.

In general rounding is easy to implement. To convert hours to days using truncation is just an integer division with 24. To do the same with conversion, rounding can be done by first adding 12 hours and then dividing with 24 hours/day giving a result in days. This is exactly the same which happens, when rounding floating points values to integers: First add 0.5 and then truncate the fraction leaving only the integer part. Simple.

For the time unit object, this functionality seems to be hard to obtain. If the source time duration has a coarser granularity than the target time duration then it is possible to perform the conversion by using convert() directly:

long targetDuration=targetUnit.convert(sourceDuration,sourceUnit);

Within the implementation, a simple multiplication is applied.

If the source time duration has a finer granularity than the target time duration then TimeUnit.convert() will apply division and truncate. To get rounding, it is possible to convert 1 from the target unit space to the source unit space, and then add half of this to the source time duration before truncating:

long targetToSourceFactor=sourceUnit.convert(1,targetUnit);
long targetDuration=targetUnit.convert(sourceDuration+targetToSourceFactor/2,
                                       sourceUnit);

The very nice thing about this is that it works for all combinations of time units.

When put into a context of more complete code, the end result may be this:

public class TimeUnitUtility
{
  public static long convertWithRounding(long sourceDuration,
                                         TimeUnit sourceUnit,
                                         TimeUnit targetUnit)
  {
    long res=0;
    
    {
      if (sourceUnit==targetUnit)
      {
        res=sourceDuration;
      }
      else
      {
        if (sourceDuration<0)
        {
          res=-convertWithRounding(-sourceDuration,sourceUnit,targetUnit);
        }
        else
        {
          int order=targetUnit.compareTo(sourceUnit);
      
          if (order<=0)
          {
            res=targetUnit.convert(sourceDuration,sourceUnit);
          }
          else
          {
            long targetToSourceFactor=sourceUnit.convert(1,targetUnit);
            res=targetUnit.convert(sourceDuration+targetToSourceFactor/2,
                                   sourceUnit);
          }
        }
      }
    }
    
    return res;
  }
}

There exists another way to do the same thing. Instead of using compareTo() it is possible to convert the time duration to the fineste time unit and then convert this duration to the target duration. This involves three convert() operations:

public class TimeUnitUtility
{
  public static long convertWithRounding2(long sourceDuration,
                                          TimeUnit sourceUnit,
                                          TimeUnit targetUnit)
  {
    long res=0;
    
    {
      if (sourceUnit==targetUnit)
      {
        res=sourceDuration;
      }
      else
      {
        if (sourceDuration<0)
        {
          res=-convertWithRounding2(-sourceDuration,sourceUnit,targetUnit);
        }
        else
        {
          TimeUnit finestUnit=TimeUnit.NANOSECONDS;
          long finestDuration=finestUnit.convert(sourceDuration,sourceUnit);

          long targetToFinestFactor=finestUnit.convert(1,targetUnit);
          res=targetUnit.convert(finestDuration+targetToFinestFactor/2,
                                 finestUnit);
        }
      }
    }
    
    return res;
  }
}

This static implementation does not make it possible to address an abstract convert() method with two possible implementations - one performing truncation and the other performing rounding. One solution is to declare the abstract method in the form of an interface -

public interface TimeUnitConverter
{
  long convert(long sourceDuration,
               TimeUnit sourceUnit,
               TimeUnit targetUnit);
}

- and then implement two different implementations, which can be instantiated and passed around:

public class TruncateTimeUnitConverter implements TimeUnitConverter
{
  @Override
  public long convert(long sourceDuration,
                      TimeUnit sourceUnit,
                      TimeUnit targetUnit)
  {
    return targetUnit.convert(sourceDuration,sourceUnit);
  }
}
public class RoundTimeUnitConverter implements TimeUnitConverter
{
  @Override
  public long convert(long sourceDuration,
                      TimeUnit sourceUnit,
                      TimeUnit targetUnit)
  {
    return TimeUnitUtility.convertWithRounding(sourceDuration,sourceUnit,
                                               targetUnit);
  }
}

However, it is also possible to make the conversion parametrized and implement strict rounding as defined by java.math.RoundingMode. The set of rounding modes includes FLOOR, CEILING, UP, DOWN, HALF_UP, HALF_DOWN, HALF_EVEN and UNNECESSARY.

public class TimeUnitUtility
{
  ...
  public static long convert(long sourceDuration,
                             TimeUnit sourceUnit,
                             TimeUnit targetUnit,
                             RoundingMode roundingMode)
  {
    long res=0;
    
    {
      if (roundingMode==null)
      {
        String message=
          "Failure to convert; rounding mode must be set!";
        throw new IllegalArgumentException(message);
      }
      else
      {
        switch (roundingMode)
        {
          case FLOOR:
          {
            res=convert_FLOOR(sourceDuration,sourceUnit,targetUnit);
            break;
          }
        
          case CEILING:
          {
            res=convert_CEILING(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case UP:
          {
            res=convert_UP(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case DOWN:
          {
            res=convert_DOWN(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case HALF_UP:
          {
            res=convert_HALF_UP(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case HALF_DOWN:
          {
            res=convert_HALF_DOWN(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case HALF_EVEN:
          {
            res=convert_HALF_EVEN(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case UNNECESSARY:
          {
            res=convert_UNNECESSARY(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          default:
          {
            String message=
              "Failure to convert; rounding mode \""+
              roundingMode.name()+
              "\" can not be recognised!";
            throw new IllegalArgumentException(message);
          }
        }
      }
    }
    
    return res;
  }
  ...
}

For a complete context of these examples and including a test suite, please refer to the archive Yelstream-TimeUnitUtility_2013-02-28.zip. This archive contains an Eclipse Java project ready for import. All eight rounding modes are implemented for both positive and negative time durations and accompanied by unit test cases.

Wednesday, February 27, 2013

The Character Encoding Circus of XML-over-HTTP

The Character Encoding Circus of XML-over-HTTP

Welcome to the Circus

Transmitting data in the form of XML over the HTTP protocol in an ad hoc manner is simple. Just put the XML data into the body of a HTTP request, send the request, and grab the returned XML data from the body of the returned HTTP response. Nothing could be simpler - or so it seems. The fact is that the process is somewhat elaborate and contains intrinsic details which only few programmers get right.

When transmitting data over HTTP it is a good idea to state the nature of the content. This is done by setting the content-type header and containing a MIME type. The header may be something like -

Content-Type: text/plain; charset="UTF-8"

This header may also define the character encoding. When the character encoding is not set within the context of HTTP most textual values default to or must be ISO-8859-1. No matter which character encoding the content-type header specifies some things must always be encoding using ISO-8859-1. This is the case for HTTP header values in general. It is even the case for the Base64 encoded content "<username>:<password>" found within HTTP Basic and set by clients in the Authorization header - this is not dependent upon any encoding set in the content-type header and contrary to the public belief of programmers. This is due to the origin and legacy of the HTTP protocol. If the state of things was different and a choice was present, the encoding ISO-8859-1 would not always preferable over other encodings.

When the content-type does not specify a character encoding, different rules apply and depending upon the MIME type used. In one case the XML content must be interpreted as US-ASCII while in another case it must be interpreted binary - reading the optional byte order mark (BOM) and processing the encoding of the initial processing instruction -

<?xml version="1.0" encoding="UTF-16"?>

This implies that the "encoding" declaration in the processing instruction may be unnecessary and ignored.

Should it happen that some character encoding is not set right, then it may not pose a problem if both the client and the server agree upon how to do things. It is often the case that clients and servers are not coded by the same programmers or that the clients and servers do have to be compatible with existing software, and in this case things must be done eight. When the transmission of XML data over HTTP is not done right the character encoding circus is open and may imply expensive debugging sessions or result in middleware which does not quite cut it and which does not work in all intended use cases.

Specifications

When transmitting XML most professional programmers use the MIME type application/xml. There is another MIME type in common use - the type text/xml>. While both types are intended for the transmission of XML, they do have different interpretations.

Specifications do exist. The RFC 3023 does specify how five different media types for XML are supposed to be used. It is not easy to read but does contain strict answers.

Regarding the basic types text/xml and application/xml the RFC 3023 contains a section "3.6 Summary" with some clear statements. What applies to text/xml - among other things - is this:

  • Specifying the character encoding "charset" as part of the content-type is strongly recommended.
  • If the "charset" parameter of the content-type is not specified, then the default is "US-ASCII". The default of "ISO-8859-1" in HTTP is explicitly overridden.
  • An "encoding" declaration in the initial processing instruction of the XML, if present, is irrelevant.

It may well come as a surprise that the character encoding US-ASCII - and not the implicit encoding ISO-8859-1 of HTTP - is the default.

The summary also contains statements about the type application/xml:

  • Specifying the character encoding "charset" as part of the content-type is strongly recommended, and if present, it takes precedence.
  • If the "charset" parameter of the content-type is not specified, conforming XML processors must follow the requirements in section 4.3.3 of the Extensible Markup Language (XML) 1.0 specification ([XML]).

What this explains is that the content is either 1) to be read and written textually and using the "charset" of the content-type header, or 2) to be read and written binary using the "encoding" declaration in the initial processing instruction of the XML.

It is recommended to handle the XML textually by using the form -

Content-Type: application/xml; charset="UTF-8"

- containing a character encoding like "UTF-8" and in this case the "encoding" declaration in the initial processing instruction of the XML like -

<?xml version="1.0" encoding="ISO-8859-1"?>

- is to be ignored. To avoid confusion it would in fact look much better with -

<?xml version="1.0"?>

- since UTF-8 and ISO-8859-1 are not compatible or interchangeable.

The section 4.3.3 of [XML] states how to read XML content from a binary stream. Just like is done from a file. First the BOM is read together with the initial XML processing instruction and its "encoding" declaration, and then second the reader interprets the binary stream textually according to the byte order and the character encoding. Some of the details of how to do this can be read in another, quite interesting section F Autodetection of Character Encodings (Non-Normative) of [XML].

Data in the form of XML is to be handled differently when contained as text as opposed to a binary stream of octets.

Strange Behaviour and Incompatibility

Even though the specifications of text/xml and application/xml is more than a decade old - it is from the year 2001 and the infancy of the world of adopting SGML in its new incarnation of XML - the differences between the types is hardly common knowledge. These MIME types are far from the only ones used for XML, but are both in common use for many not-always-engenius and proprietary ways to communicate XML.

Complex matters tend to become a circus of strange behaviour and incompatibility only when present and existing specifications are not followed and adhered to.

Once the circus is open it includes some of the outmost unbelievable, unanticipated, hard-to-address, restrictive effects, and this is not even touched upon here by the example extravaganca present in the real world of application programming.

Should you ever hear statements about text/xml and application/xml being equal - or not in common use -, then you have acquired one of many tickets to the circus.

Sunday, December 9, 2012

A Leap of Faith Abandoned

Performance

The six aspects of PRINCE2 project performance:

  1. Costs
  2. Timescales
  3. Quality
  4. Scope
  5. Risk
  6. Benefits

Principles

The seven PRINCE2 principles:

  1. Continued business justification
  2. Learn from experience
  3. Defined roles and responsibilities
  4. Manage by stages
  5. Manage by exception
  6. Focus on products
  7. Tailor to suit the project environment

Themes

The seven PRINCE2 themes:

  1. Business case
  2. Organization
  3. Quality
  4. Plans
  5. Risk
  6. Change
  7. Progress

Roles

The PRINCE2 roles and responsibilities:

  1. Project Board
  2. Executive
  3. Senior User
  4. Senior Supplier
  5. Project Manager
  6. Team Manager
  7. Project Assurance
  8. Change Authority
  9. Project Support

Processes

The seven PRINCE2 processes:

  1. Starting up a Project (SU)
  2. Directing a Project {DP}
  3. Initiating a Project (IP)
  4. Controlling a Stage {CS}
  5. Managing Product Delivery {MP}
  6. Managing a Stage Boundary (SB)
  7. Closing a Project (CP)

Product Description outlines

The 12 PRINCE2 baseline management products are subject to change control:

  1. Benefits Review Plan
  2. Business Case
  3. Communication Management Strategy
  4. Configuration Management Strategy
  5. Plan
  6. Product Description
  7. Project Brief
  8. Project Initiation Documentation {PID}
  9. Project Product Description
  10. Quality Management Strategy
  11. Risk Management Strategy
  12. Work Package

The six PRINCE2 records are dynamic management products:

  1. Configuration Item Record
  2. Daily Log
  3. Issue Register
  4. Lessons Log
  5. Quality Register
  6. Risk Register

The eight PRINCE2 reports are management products providing a snapshot of status:

  1. Checkpoint Report
  2. End Project Report
  3. End Stage Report
  4. Exception Report
  5. Highlight Report
  6. Issue Report
  7. Lessions Report
  8. Product Status Account

Monday, July 16, 2012

Morpheus

The MOS Technology 6502 and the Commodore 64

Ever heard about the micro computer madness which took off in the early 1980s? All kinds of personal computers began to be accessible to ordinary people at home.

CPUs

This includes micro computers build up around all kinds of 8-bit CPUs designed during the 1970s and dominated by the Zilog Z80 (Z80) and the MOS Technology 6502 (6502), which was an offspring of the Motorola 6800. At around the same time, the IBM Personal Computer (PC) had become popular too. The PC was much less common among home users due to its much higher price. The PC used Intel CPUs 8086, 8088 and 80286, and these were all 16-bit micro processors. At the time, the PC was not comparable to the 8-bit micro computers and did not become common among ordinary people at home before from around 1990.

The current MOS Technology 65xx micro processors are made by a company named Western Design Center (WDC).

Commodore International

Many micro computers came into existence. A number of these were produced by Commodore International. There first home computer was the Commodore PET from 1977. They introduced the Commodore VIC-20 in 1980. This became available at an affordable price at radio stores and warehouses. It was a great machine, but somewhat limited mostly due to having only very little RAM of 5 KiB.

The nearly identical form factor of the Commodore VIC-20 was used in the Commodore 64 (C64) from 1982. This had 64 KiB RAM. This size of memory was somewhat expensive but Commodore and Jack Tramiel - which just died in the spring of 2012 at the age of 83 - had foreseen that RAM prices were falling. The Commodore 64 was the single best-selling model of all time and as a result of the machine having a unique combination of sound and graphics chips together with an adequate amount of memory. All I/O was memory mapped on top of RAM and ROM at up to six levels. The software was present, but nothing special. Later models was the Commodore 16, the Commodore Plus/4 (which was compatible with the Commodore 16) and the Commodore 128 (which included full hardware compatibility with the Commodore 64), which were all 8-bit computers. Later came the Commodore Amiga in multiple models. This was a 16-bit computer with an actual operating system. It was highly popular, but it never topped the popularity of the Commodore 64.

The Commodore 64 was released in August 1982 and contained the MOS Technology 6510 processor - essentially identical to the MOS Technology 6502 processor - together with a number of dedicated chips for audio, graphics and timing. This original 6502 processor contained as little as 3,510 transistors. The current high-end CPUs of today contain in the order of 5,000,000,000 transistors.

How Commodore absorbed MOS Technology and by the mid 1980s had effectively abandoned further development of the 6502 in favour of producing micro computers is a story in itself.

The Life of the Commodore 64

All kinds of applications became available for the C64. This includes office applications and games. The first computer games were somewhat simple. They used the internal MOS Technology SID sound chip and the MOS Technology VIC-II graphics chip with 16 colours and eight hardware sprites in a somewhat simple manner.

Things changed during the 1980s. The hardware was pushed well beyond its original intentions. Colours were manipulated, all kinds of raster manipulations were made. This included advanced sprite multiplexer routines to produce more sprites then the eight originally intended - an arbitrary number, for pratical applications the limit was around 24-48, for the visible area around 72, including the borders up to a total of 112, but never more than eight on the same horisontal line.

To study these things, books and printed magasines were a much needed resource. The most important book to get hands on for a description of all the basic stuff and all the registers of the machine was "Commodore 64 Programmer's Reference Guide". The game of pushing the hardware became more and more elaborate - in fact it has continued to this day and has been exploited by the demoscene. Today, both all basic and all obscure exploits of the machine can be found on the internet.

Since the C64 was mostly used for gaming, the world of computer games is a good reference for a timeline. A selection of games is this:

  • Bruce Lee (1984)
  • Impossible Mission (1984)
  • Paradroid (1985)
  • Uridium (1986)
  • Paradroid - Competition Edition (1986)
  • Delta (1987)
  • Morpheus (1987)
  • Maniac Mansion (1987)
  • IK+ (1987)
  • Hunter's Moon (1987)
  • Zynapse (1987)
  • Nebulus (1987)
  • Morpheus (1987)
  • The Last Ninja (1987)
  • Katakis (1988)
  • Armalyte (1988)
  • Hawkeye (1988)
  • The Last Ninja 2 (1988)
  • Zak McKrackenand the Alien Mindbenders (1988)
  • Impossible Mission 2 (1988)
  • Citadel (1989)
  • Paradroid - Metal Edition (1989)
  • Turrican (1990)
  • The Last Ninja 3 (1991)
  • Turrican II (1991)
  • Enforcer (1992)
  • First Samurai (1992)
  • Fred's Back (1993)
  • Mayhem in Monsterland (1993) - "Variable Screen Positioning"
  • Heavenbound (1994)
  • Newcomer (2001)
  • Metal Warrior 4 (2003)
  • Metal Dust (2005) - a SuperCPU-specific game

This selection deliberately indicates a concentration of titles around 1987-1988. During the life of the C64, more than 10,000 titles were released.

The Classical Period - 1982-1988

After a few years beyond the initial release of the C64 in 1982 and with simple games like Frogger, Asteroids and Defender type of stuff, the better-to-good things came. Programmers knew the platform, people owned it and knew it in minute detail. For a period of five to six years, the software created exploited the hardware better and better. This went on until 1987-1988 at which time the most sofisticated games came out. This is the end of the classical period in which personalities like e.g. Andrew Braybrook and Martin Walker peeked.

The Extended Period - 1988-1994

Beyond the classical period dominated by 8-bit home micro computers and as the 16-bit micro computers rised, the C64 had an extended period of life from 1988 and to this day in 2012. Some of the late coders and the mostly skilled people continued to code for the C64. Good things continued to be released - things with extraordinary scroll and graphics techniques like First Samurai, Fred's Back and Mayhem in Monsterland.

The Period of Enthusiasts - 1994-2012

After the bankruptcy of Commodore in 1994, the world of the C64 was no longer driven by commercial business, but by enthusiasts. The demoscene has lived on to this day, but may at now have squeeshed the very last colour, the last byte, and the last, current nuvie video player out of the machine. The period lead by enthusiasts has lead to new hardware, new games, fantastic software emulators and new hardware implementations.

Mental Procreation

The classical period had many heroes. It was a time, when a single individual could make a computer game and inspire a whole world of followers. Many such individuals existed, but some protuded more than others.

Andrew Braybrook

One such individual was a programmer named Andrew Braybrook. He was skilled beyond most people and had a physical appearance with a big smile, long hair and large glasses. He also had humor and often worked together with another, clever guy named Steve Turner.

Zzap!64

Andrew appeared in multiple, printed magasines, which back in the 1980s was a main resource of inspiration. One of the magasines was the British Zzap!64. Some of the historical content can be accessed online. Andrew appeared twice in a series called Diary of a game.

Paradroid

The first appearance was in The birth of a Paradroid. This was about a high-tech game named Paradroid and which is one of the best C64 games ever made. Contrary to many of the games for the C64, this one was not a one-way, scripted game, but had the character of a maze and which the player could examine freely. The game was also a multi-scroll game and it did scroll not only the character map, but also the colour map. This was hard to do with little processor power and is presumably the reason why "only" 16-17 lines of the 25 lines are moved.

The game represents a masterpiece in what could be crammed into the memory of the C64 and was state-of-the-art anno 1985.

Uridium

In 1986 Andrew made the fast, arcade-like shoot'em up game Uridium, which once again very carefully pushed the computer to its limits. The Dreadnoughts of the game can be seen in the review of Uridium in Zzap!64. The game is suppossedly the very first scrolling game to present a background with a fixed starfield. This too was a fantastic game and once more pushed the machine to its limits.

This game can still be played through emulators, but it may be hard to find an emulator which does synchronize the update of the raster with the actual screen. The original ran flawlessly.

Morpheus

In 1987 the second diary made by Andrew was for a game called Morpheus and appeared under the title Mental Procreation. Once again the diary ran for multiple issues of the monthly Zzap!64 magasine.

At first glance - and in particular today - it may appear as if there is a lot of nonsense in the text of this diary. There is - but it can be completely disregarded once some of the thoughts about how to put functionality into the game are presented. For one thing, there was something about Andrew not wanting to have the directional scrolling be based on a simple XY-based scheme (which tends to produce animations, where things come to a halt in a grim way stopping along one of X and Y before the other) but on polar coordinates. Who did use polar coordinates for anything back in these days?

From the article of Andrew's and where he explains something about the unit circle split up into 256 directions, you can just imagine that preserved memory and made a table with two bytes per direction and kept only one eights leading to a table of only 32*2 bytes. No matter what is the truth here, the article was a great inspiration to many people.

The diary contains lots of other goodies like talk and indications about how sprites are compressed, and how there exists transitions between characters and sprites and to reserve sprites. You have to be a Commodore 64 programmer to understand and value this.

Morpheus had less immediate appeal as compared to the fast scrolling shooter Uridium. The game also requires a manual to be understod. Patience is easily lost before an insight into the depths of the game is reached. While Paradroid and Uridium was ported to multiple 8-bit and 16-bit platforms, Morpheus was not. While the game may have been a commercial failure, it appears to be one of the best, technical masterpieces of the classical period of the C64.

Fragments of the history of Graftgold, Hewson, Steve Turner and Andrew Braybrook can be found. Interviews with Andrew and Steve and comments about their transition from 8-bit to 16-bit can be found here:

According to what can be found on the internet, it appears as if Andrew still lives on to program - he has been working as a software developer for an insurance company. The past performance of his was incredible.

Andrew Braybrook - one of the most talented C64 programmers.

Protovision and Jeri Ellsworth

The original hardware from Commodore Business Machines was produced in a number of revisions from 1982 to 1994. The height of the gaming period for 8-bit machines and the C64 was from 1982 until around 1988, where 16-bit machines like e.g. the Commodore Amiga took over. Prices and popularity fell with time after 1988 and until Commodore went bankrupt in 1994. During the last part of the 1980s, home computers switched from 8-bit to 16-bit. Commodore themselves made the series of Amiga computers, but ended up mishandling the different models and finally lost to the competition, the rise of the modern PC, and time. Today, a hardware platform like the C64 is not produced as long as 12 years.

Protovision

In the historical movie War Games (1983), a young hacker named David Lightman (Matthew Broderick) tries to hack into a game company called Protovision and with the intention to play their games. In the movie, this was a fictional company.

A company actually named Protovision has existed and appears to still exist. It is a German company founded in 1997 and its does game development and game distribution. This company is one of more keeping life in the C64 and producing hardware and software for the machine. How the old electronics of the machines built between 1982 and 1994 happens to be still functional, is hard to tell, but some of them continue to work. Among other things, the company distributes an add-on with two additional joystick ports and a number of four-player games. C64 enthusiasts continue to develop new hardware, including Ethernet cards, specially adapted hard disks and flash card interfaces.

The SuperCPU

During the lifetime of the C64, different cartridges containing a compatible replacement 65C02 CPU running at e.g. 4 Mhz were produced during the 1980s, but they never gained popularity. Since everything inside the C64 is memory mapped and accessible through the expansion port used by cartridges, the idea was clever.

In the mid 1990s, the company Creative Micro Designs (CMD) produced a large cartridge called the SuperCPU. This cartridge contained a replacement CPU - a WDC 65C816 (65816), which is a 16-bit processor compatible with the 65C02 and the 6502 and capable of addressing 16 MiB of RAM. The 65C16 does not have a linear address space, but a 24-bit address space segmented into 256 banks with 64 KiB of memory in each. The processor in the SuperCPU ran at 20 Mhz and could execute all original C64 programs at least to the extent that these were not tied too hard to the original 1 Mhz timing of the original 6502 and the raster of VIC-II. Few programs were made specifically for the SuperCPU. The SuperCPU did not operate without a C64 host, since the host had the chips for sound, graphics and input devices.

Protovision continue to have a set of pages about a shoot'em up named Metal Dust and made specifically for the SuperCPU. The pages contain a fascinating trailer and a story about the making of the game including the programmers Stefan Gutsch and Chester Kollschen. This game appears to have been produced at some time between 1998 and 2005 and released in 2005.

The SuperCPU is no longer produced and can hardly be obtained. A most clever device.

Software Emulators

Something else happened in the mid 1990s - software emulators began to appear. These have been refined in numerous ways and the best of today are cycle exact and reproduce the original computer in all cumbersome detail. To this day these emulators keep life in the C64 and its very large software base.

One of the best of these is the Versatile Commodore Emulator (VICE).

The RAM Expansion Unit

Commodore in the mid 1980s made a range of RAM cartridges known as a RAM Expansion Unit (REU). This refers to specific models 1700, 1750 and 1764 having from 128 KiB to 512 KiB of RAM. Creative Micro Designs made a special version 1750 XL having 2 MB of RAM.

Back then memory was still expensive and the units were produced in smaller numbers. The way these REUs were mapped in memory allowed for a total of 16 MiB of RAM, and emulators like VICE implement this in all detail. In fact, the emulation of REUs are cycle-exact - including the direct memory access (DMA) offered by these units. The C64 itself can not move memory around like the DMA of the REU. Memory can be moved around one byte at a time by painstakingly performing load and store instructions over multiple cycles. The DMA offered by the REU can move one byte per cycle which is much faster than possible by the native C64. By moving memory back and forth between the memory of the C64 and the REU, it is possible to move memory around at about five times the speed of what is possible without the REU.

The expansion units were originally used mostly by the GEOS operating system. Due to chip shortages of the 1980s the REU was produced in a small number which ended up in the company behind GEOS to produce an alternative cartridge named geoRAM. The geoRAM was much slower in operation since it did not offer DMA like the REU.

The C64 is often credited with starting the computer subculture known as the demoscene. Today, the demoscene takes advantage of the original REUs in the form of VICE (which can emulate a number of other, possibly less common types of RAM expanders). For some expressive examples, look up BLUREU by Crest and find examples of nuvie players.

Sadly, since the REUs were uncommon among ordinay C64 owners, no games really take advantage of REUs (except for a few, some of these are possibly still available from Protovision). With a native C64 machine, it is very hard to populate the 16 MiB RAM of the REU, but using VICE it is easy. Using an original REUs to do DMA back and forth, memory and e.g. scroll effects can be done more effectivly than the native C64.

The Commodore 64 Direct-to-TV

In 2002 the initial version of a single-board computer named CommodoreOne (C-One) was made. The board was intended as an enhanced C64, but was reengineered to allow cloning of other 8-bit computers. The default CPU was the 65C816 processor equal to the processor of the SuperCPU. Today, the board is also called the C-One Reconfigurable Computer. The C-One was designed by Jeri Ellsworth and Jens Schönfeld, and there appears to exist active communities around the board and its different configurations.

Jeri EllsWorth - which happens to be a self-taught computer chip designer - went on and designed the Commodore 64 Direct-to-TV (C64DTV), which is a single-chip implementation of a C64 housed in a joystick modelled after the very popular Competion Pro joystick of the 1980s.

The C64DTV came out in 2004 and 2005 and was intended as a toy. It runs on batteries and has only two cables for video and audio. Internally, it emulates a 6510 CPU, a VIC-II, a CIA and more in a single ASIC. There were different versions with slightly different functionality and RAM sizes from 128 KiB and 2 MiB. They all had a 2 MiB ROM or flash containing 30 original, pre-installed games managed by a dedicated software loader.

The electronics inside is dirt cheap, but has been dissected in great detail by the C64 community. Hackers can change the ROM and add other games. Modders have scrutinised the electronics and have found internal connectors for keyboard, joysticks, floppy, power and more. Many modifications of the C64DTV joystick and it electronics exist.

The C64DTV contains new functionality not found in the original. It has DMA, it has a blitter for fast image transfer, it has 256 colours, and even new graphics modes including a 320x200 mode with 8 bits per pixel (one might well assume this is placed in the second 64 KiB bank of the RAM). The C64DTV contains some bugs and faults relative to the original C64. This includes possible problems with how the keyboard is read and it includes some faulty wiring of the circuitry for handling the colours, which happens to be fixable by enthusiasts.

Even though the electronics is cheap, the C64DTV has become very well known and even adopted in VICE as a special type of C64. The demoscene has also taken advantage of the C64DTV and its new possibilities. The toy was produced in 250,000 units. Sadly, it can no longer be bought.

According to the internet, in early 2012 Ellsworth was hired by Valve Corporation (along with several other notable hardware hackers) to work on gaming hardware. Presumably, this might well be something beyond making C64 compatible devices.

The Commodore 64x

In 2011 the current owners of the Commodore brand made the Commodore 64x (C64x). This is basically a 64-bit Intel x86 home computer within a cabinet looking somewhat like the original C64. The hardware is a well-chosen ATX board but has nothing in common with the original. The keyboard has been adapted to be more modern and PC-like, and the orignal graphics of the keys is missing.

The C64x is a great gimmick, but does in itself and without a software emulator not provide native C64 compatibility.

The Turbo Chameleon 64

Regarding native compatibility, something most interesting exists. What appears to be in direct relation to the work of the C-One, a man named Peter Wendrich has produced a C64 compatible cartridge named Turbo Chameleon 64. The cartridge provides VGA and keyboard connectivity and has a slot for a SD flash card. To produce the image for the VGA signal, the cartridge emulates the VIC-II completely by bus snooping - is quite a nice trick. It emulates a 6510 at the order of 10+ times of the native speed. It also has a cycle exact emulation of the original series of REUs. It has lots of other features including the implementation of a second SID chip to provide stereo audio (equal to a lot of the modifications of original C64's done by enthusiasts). Not to mention the emulation of a lot of the original cartridges of the C64 era.

The cartridge is so complete in its emulation that it can even run in stand-alone mode without a C64 host. The cartridge has an optional docking-station with four joystick ports and keyboard connectivity (possibly equal to the add-ons sold by companies like Protovision and Individual Computers).

It appears that this rich platform continues to be in active development. There even exist firmwares for other platforms than the C64 - mentioned are e.g. the ZX Spectrum.

This cartridge is quite extraordinary.

The Future of the Commodore 64

Now, what could the future bring? Is there a future at all?

Well, die hard fans of the C64 will continue to exist and for as long as they can remember their childhood dreams. They will keep the platform alive in the form of software emulators and hardware implementations and to continue to run the estimated 10,000 titles made for the platform. Possibly on a rare occasion a new game will be made; using just simpler, present extensions like the old, existing REU and its DMA some possibilities for making something new does still exist.

Keeping the compatibility with the native C64 is of the outmost importance, since this touches upon the mentioned childhood dreams. The community around the 6510/6502 processor is alive, since many companies continue to use the processer in all kinds of embedded products. In fact, it is said, that there have never been produced so many 6502 processor as right now.

How about expanding aggressively upon the processor capabilities - while gaining access to a large, linear address space? Why not go beyond the 16-bit processor of the SuperCPU and its 16 MiB address space and continue with a 32-bit processor and a 4 GiB address space?

The Next Generation C64

The best, very cool thing for a Next Generation C64 (C64-NG) would be a 32-bit processor with 6502 compatibility and providing access to a 4 GiB linear address space. It may not run very fast, perhaps in the range of 10 to 100 Mhz, still intended for coding close to the steel, and still intended for low power. Imagine this manifested in a smaller form factor not unlike some of the devices already made.

What would the requirements to such a processor be?

Well, within each bank of 64 KiB of memory, the original 8-bit instruction set should operate and with all the original opcodes. Otherwise all the original programs and all the known routines would not operate. The stack allowing for only 256 bytes would possibly need to be moved out of its default location at [$0100;$01FF] and to an upper 64 KiB memory bank, but then again if the stack is enlarged then e.g. the stack pointer would have to be widened too. Then a new set of instructions addressing 32-bit functionality and a 4 GiB linear address space should be present. If such a system was to be build and connected with older hardware, it could become important to be able to connect with the old, 8-bit based address busses of the original C64.

Does such a 65xx processor exist?

Well, the current state-of-the-art 6502 compatible processors seem to be made by WDC. The faster 8-bit 6502 processors of today and from WDC run at 20-200 Mhz. The faster 16-bit 65C816 processors run at 20-100 Mhz.

WDC has once made specifications of a 32-bit successor to the 8-bit 6502 and called W65T32 Terbium. This is from the mid 2000s and was thought to be the next generation 65xx processor. It seems that this processor has never been finalized and produced. A complete instruction set does appear to exist - if looking hard on the internet, some specifications can be found. This W65T32 Terbium processor has compatibility with both the 8-bit 6502 and the 16-bit 65C816. It continues to be intended for low power system designs.

It is a striking thought that a 32-bit, 6502-compatible processor is 10 to 20 years too late for commercial applications and is never going to be produced. Different kinds of ARM processors in 32-bit and 64-bit implementations have for a long time ruled the world and will continue to do so. For commercial applications, the 65xx platform may end up dying with the 8- and 16-bit embedded world.

There still exist large interests in the 65xx platform. When examining the subject, it becomes evident, that multiple, other efforts have been made to formulate af 32-bit 65xx platform:

  1. A processor called 65GZ32, which appears to be RISC-like in nature and not really faithful to the original 6502.
  2. The 6502 microprocessor resource mentions 65Org16 and 65Org32, which both are referred to from "65Org16 thread summaries".
  3. 65k.
  4. T65.
  5. Extended 6502 for 8-bit and 32-bit embedded systems (6502EX) and which appears to have a mode named 6502EX-VS and possibly suitable for an old, physical C64.
    Actually, this 6502EX appears to be worth examining in further detail.

At a quick glance it is unclear, which of these custom processors provide compatibility with the instruction set of the 8-bit 6502, and which just expand all addresses and registers without keeping the original instructions. However, it is clear, that thoughts have been made. It would be interesting to know more about which of these 32-bit 65xx platforms have backup from the 65xx community.

The C64-NG could look like this:

  1. A 32-bit 65xx processor compatible with the original 8-bit 6510 and providing access to a 4 GiB linear address space.

    All I/O of the original C64 should continue to be mapped in the first 64 KiB memory bank.

    Possible modes should include i) a cycle exact, 8-bit based, 1 Mhz mode together with ii) a mode running at a full blown frequency somewhere around 10-100 Mhz.

  2. Tweaks and extensions compatible with the MOS Technology SID chip and its mapping in the original C64.

    This could include a second SID chip for a stereo option and while keeping the original sound.

    Likely some interesting ideas regarding sampling rate and effective bits and channels exists and have already been played around with on the original C64 hardware.

  3. Tweaks and extensions compatible the the MOS Technology VIC-II chip and its mapping in the original C64.

    A great gimmick would be a second sprite engine to support more sprites and as a replacement of the advanced, somewhat error-prone sprite multiplexer solutions. These sprites should be of the known, standard 24x21 kind and up to some larger count like ... 64, 128 or 256. Possibly including a better, revised scheme for the detection of sprite-to-sprite collisions. This would extend the sprite capabilities but keep the graphics recognizable and the second sprite engine usable within the strict limits of the 64 KiB of the original C64. To what extend this should be cycle exact and relative to a processor and its addressing of memory and busses and would steal CPU cycles or not ... well, this may require some further thoughts.

    The possibility to edit the standard 16 colours and select these from a larger palette of colours.

    New colour modes not faithful to the original C64 - perhaps starting with a 320x200 pixel, 8-bit per pixel mode like the C64DTV. Sprites and the VIC-II may have nothing to do here, but as a replacement some blitter functionalities would be nice.

  4. The addition of more advanced memory handling schemes to move memory around fast. This should include movement of memory within a single 64 KiB memory bank. Relative to physical busses, could this be made faster than the DMA of the REU, which moves one byte per cycle?

  5. Regarding the physical connections of such a device, many possibilities exist. Existing cartridges can be used for inspiration.

Would anybody use such a 32-bit C64?

Well ... what do you think? If you ask the hardcore enthusiasts, the answer is quite clear, since discussions about the subject continues to go on to this day. Many C64 enthusiasts fancy the unattainable SuperCPU of the mid-to-late 1990s. Hardcore fans of the 6502 processor continue to talk about a version with a large address space. The 32-bit capabilities would appeal to enthusiasts within both the world of the C64 and the world of 65xx embedded programmers - individually.

A blast from the past. Time is hard upon things. This includes the products of our heros like Andrew Braybrook and the raw metal of the C64.

The C64 is nostalgia at its best. 30 years and counting.

Morpheus, the god of dreams, has the ability to take any human form and appear in dreams.

Monday, June 25, 2012

Two-Factor Authentication

Introduction

Two-factor authentication (TFA) is often used to let users authenticate themselves to computer systems in a safe way. It requires the use of two authentication factors out of three possible:

  • Something the user knows - e.g. a password, a code or a PIN number.
  • Something the user has - e.g. a smart card or a hardware token.
  • Something the user is - e.g. a biometric characteristic - possibly a fingerprint.

By itself, two-factor authentication is nothing new. However, many of the applications are more recent.

The Golden Standard - RSA SecurId

At some time around 1980 a company was granted a significant US patent. The patent had a long description, but the essense of the content was how two parts could agree upon a token without ever communicating and by just agreeing upon the current point in time. Today, this company is known as RSA Security.

This idea led to the RSA SecurID token - something a user has - and which is used in combination with something a user knows to implement two-factor authentication.

The token is personal in that it has been factory initialized with a personal key together with the current time. It is basically a digital watch implementing a propritary hash-algorithm. These tokens contain no servicable parts. The time in even the cheapest, digital clock never drifts unless the temerature is changed significantly. No resetting of any kind is ever necessary. Once initialized, the circuitry can run for multiple years up until some date of expiration. Once the battery is empty, the tokens are disposed.

For 30 years RSA Security has had a great business selling their custom two-factor authentication solutions. Until not too long ago, these tokens were expensive. RSA Security went for the biggest customers only. As part of selling their products, they made their customers sign a non-disclosure agreement including statements about the customer promising not to try to reverse engineer the products sold. However, some individuals around the internet have done some reverse engineering. This possibly started with some Russian engineers being experts in electronics.

Many other companies have tried to follow in the tracks of RSA Security. However, US patents last for 20 years. In order to circumvent the expiration of the original patent, the original patent has been covered with newer, overlapping patents which once granted have the effect of extending the effect of the original patent. The US patent law does not really try to honour the publication of private inventions by granting a limited, exclusive right to use. The European patent law is quite different from the US patent law, but let this be for now. Competing companies have for a long time had to invent other ways to implement two-factor authentication. Since the idea of using the clock and the timeline has been covered by patents and could not be used, then different kinds of other strategies have been put to use. Many of these have involved hardware tokens where the token and the server must be kept in synchronization relative to the number of tokens used. Many of these alternative solutions happen to appear somewhat clumsy during actual use.

Today, the RSA SecurID token exists in many forms. The basic idea continues to be the same. Since RSA SecurID tokens have existed for a long time and the solution with using the time is very elegant, this has by many (including RSA Security itself) been called the golden standard of two-factor authentication.

Since the solution is the golden standard and its use has been implemented all over the world and by many large corporations and organizations, it has been the target of much scrutiny from researchers and the like. One possible example of the outcome of this is exposed in Scientists crack RSA SecurID 800 tokens, steal cryptographic keys. This indicates that the effective security of the solution may possibly degenerate over time.

As it happens with e.g. common, NIST-approved hash functions and public-key cryptography, which happens to be weakened over time and as a result of research and scrutiny, nothing lasts forever.

Danish NemID

Since July 2010 there has in Denmark existed a common two-factor authentication scheme called NemID (translated EasyID). Initially, the "has" part was something as simple as a small card with printed numbers:

This is developed by the company DanID and is used by Danish internet banks, government websites and private companies.

At first, this somewhat low-tech solution was disappointing. Technically, it does not impress. However, as compared to digital circuitry, the solution has some immediate benefits since it helps implement two-factory authentication right away and without any kinds of compatibility problems which may easily be introduced as a result of using electronics. Additionally, one-time pads generated from a random source of input happens to be the safest possibly codes to use.

The cards with printed numbers do expire and users do need to have new cards issued and with regular intervals. Some more recent developments have lead to a hardware token, which can optionally be used in place of the card.

This does not expire too fast as compared to the card. The physical appearance of the token with its form factor and its rounded corners does happen to give associations to some of the RSA SecurID tokens.

What the future may hold for NemID is hard to tell. For one thing, I have recently read that DanID has tried to make their NemID work on mobile terminals. Apparently, some of their first initiatives have not been backed up with enough money to implement it. I have read some indications of this in Eight million DKR to NemID on the cell phone was not enough. Possibly, three times the amount of money happens to be necessary to implement a solution for mobile terminals.

Yelstream SecurePIN

Back in the year 2003 someone told me that it was impossible to implement something similar to RSA SecurID on mobile terminals. This I had to explore.

At around 2003, one of the most common platform for mobile terminals was Suns MIDP/CLDC - a set of profiles within the Micro Edition of Java. Nokia had many variants of a smaller series of terminals with colour screens and called Series 40. SonyEricsson had introduced a model called T310 which was crammed with functionality and which was one of the very first models with Java and colour and allowing for a decent sized program to run. These generations of mobile devices had very fragmented memory of which some was available to user programs. Devices like these allowed for a program of a static size of 50-64 KiB and a runtime size of 96-160 KiB. To reduce the size of compiled programs, it was necessary to apply hard obfuscation. The solution of mine used around 35 KiB of static space.

After studying, seeking information, asking around and trying out a bit, I came up with a solution named SecurePIN and built around the concept of a personalized hash function:

This uses a hash-based message authentication code (HMAC) to calculate a message authentication code. The calculation needs a secret, personal key in combinatation with the actual input. The cryptographic strength of the HMAC depends upon the strength of the underlying SHA-1 hash function. The actual input used was the current time modulus some chosen amount of time. The output of the hash function had to be processed carefully to reduce it to a number of decimal digits.

This ended up with a fully functional prototype as described in more detail here: SecurePIN.

The difficulty with which this implementation was crammed into the phones and by using standard libraries, is hard to imagine for the common Android and iPhone developers of today. The other end of embedded programmers would try to state that a few KiB would be enough both now and then, which of course is not right.

Putting the "has" part of two-factor authentication into a mobile terminal has advantages over propritary tokens. For one thing, people already carry their mobile phone with them. For another thing, it is possibly to include MANY software tokens without having to carry the same amount of physical tokens. As a third thing, it is as part of the personalized initialization process possible to set the length of output.

Have you ever tried leaving home on your bicycle and riding to work with just your mobile phone and your house key?

Using a common platform in favor of propritary, epoxy-sealed tokens may open up for unwanted access. For one thing, it may be possible to steal a persons phone, set the clock of the phone to some time in the future - say, a day later - and read the token for this day, set the clock back to normal, hand the phone back without the owner knowing that it was ever gone, and then a day later applying the PIN code without actually having the phone.

Should something like SecurePIN be implemented today, a number of things should be considered. While the hash function SHA-1 was safe before 2005, multiple security flaws have been found. For possible consideration is the set of SHA-2 functions or even the SHA-3 functions (to be finalized by NIST in 2012).

Using an encrypted keystore to save personalized, secret keys could be a good idea. Very much depending upon the purpose, it may be a cludge to force the user to remember and enter yet a password to open up the keystore. But it could be much safer, if the device was ever lost. Classical keystore handling tends to be CPU-intensive.

Other constructions are possible. Whetever these other constructions and improvements could be, it would be extremely important to test the randomness of the produced numbers by applying a common chi-squared test.

Other, important subjects for consideration is the common negligence of users and the procedures for how to perform personal initialization and issue secret keys in a safe manner.

It most certainly was possible to implement two-factor authentication by using the timeline.

Sunday, June 3, 2012

Manifesto of Time Intervals

Introduction

This is a note about the handling of time intervals within IT systems. The note contains observations of a general nature only.

Within multiple customer projects, I have experienced the specific effects at all levels of detail and for all kinds of reason. I have seen the handling of time intervals go wrong and lead to most obscure, complex code which can be corrected by the highest skilled programmers only.

General

In a typical IT system, there are many data that are valid within a given time only. This applies especially to systems of an administrative nature, which are user-facing screens and persisted data of many kinds. Validity of data is reflected by dates, time stamps and time intervals. As an expression of periods of validity and durations are intervals especially interesting.

Across development in general and across people with different focus, there is broad consensus that a time interval is expressed by the indication of two dates or two time stamps and that these time intervals is something that you can handle right away and in a good way. The mathematical interpretation is that a time interval is simply a contiguous segment of time in the one-dimensional space. It should be simple to handle.

When you want to handle it, there are several issues that require consideration and must be handled seriously: Granularity, homogeneity, representation. The following is a short presentation of these issues. Even though the presentation is somewhat abstract, the effects are shown very specifically.

Granularity

The subject of granularity is found in several variants and for a number of reasons. Granularity is emerging as an issue because some people focus on an application's presentation and quickly conclude that the validity from a working standpoint is obvious throughout the day. Here one thinks like that when one thing is valid until January the 10th and the second thing applies with effect from January the 11th, then these are an expression of disjoint intervals:


[…; January 10th 2011] , [January 11th 2011; …]

There is no doubt as to which interval a day belongs to.

Some people like an IT architect focus on the domain logic of an application and they think that the user-facing screens of the application tend to change over time. What initially could be formulated as valid in terms of whole days may change to be formulated in parts of the day and counted in hours, minutes and seconds. While durations of time initially appears to be best presented in terms of hole days, it is best to avoid tying this level of granularity into the domain logic. It is more tractable not having to modify the domain logic at all, rather than having a certain granularity bonded into the application in both the presentation and in the domain logic. With a focus on domain logic, it is natural to choose a finer granularity, which includes not only whole days, but seconds and perhaps even milliseconds. The previous time intervals in the presentation is thus to be presented in a different way and including a finer granularity:


[...; 2011-01-10 23:59.59.999], [2011-01-11 00:00:00.000; ...]

Just as the domain logic is an issue in relation to the presentation, the domain logic is also an issue in relation to the persisted data. Persisted data may have a coarser granularity like the presentation of the data, or the persisted data may have varying granularity where some parts are persisted at a coarse granularity in whole days and where some parts are persisted at a finer granularity including seconds. When any given set of persisted data is inspected from the perspective of the values alone, it can be quite difficult to determine if the data represent closed intervals like –


[...; January 10th 2011]

– or if the data represent open intervals like –


[...; January 10th 2011[

– which easily makes a difference since there is a day in between. Only when making a more active role and examining how consecutive, disjoint intervals are represented, it becomes possible to determine if the end-point in one interval is different from the starting point in the consequtive interval –


[...; January 10th 2011] , [January 11th 2011; ...]

– or if the two points are equal, which opens up for choosing between either the interpretation –


[...; January 10th 2011[ , [January 10th 2011; ...[

– or the interpretation –


]...; January 10th 2011] , ]January 10th 2011; ...]

– which in turn is a day of a difference. For data with a coarser granularity like whole days it becomes more important to handle the endpoints correctly as either open or closed and at the correct endpoints.

Homogeneity

A handling of data is found in the domain logic which is serving a purpose between the presentation and the persistence. It is therefore natural to focus on ensuring that there is one common way to express time intervals. If it is possible to avoid having functionality to compare both open and closed intervals, if it is possible to avoid having functionality to compare time intervals at different levels af granularity like whole days and seconds, and if it is possible to avoid comparing intervals across both, then domain logic becomes tractable. It is preferable to have a uniform representation.

As part of a presentation, time intervals are selected and displayed, but never manipulated or modified. It is possible that some trickery is required when selecting endpoints. If a user selects dates as an interval like –


[January 15th 2011; January 25th 2011]

– then it may happen that the communication between the presentation and the domain layer should be at a different level of granularity such as –


[2011-01-15 00:00:00; 2011-01-25 23:59:59]

– or as –


[2011-01-15 00:00:00; 2011-01-26 00:00:00[

– and depending upon how the domain logic is constructed. As part of a presentation, it is possible to handle the fact that an endpoint chosen as January 25th is send to the domain logic as January 26th, and that an endpoint received as January 26th is presented as January 25th. When a presentation is free from changing the meaning of time intervals as it is otherwise done within the domain logic, then it is possible to perform these conversions in a good and sound way.

Something similar is true regarding persistence. When the meaning of time intervals is kept unchanged, then it becomes possible to handle conversions between domain logic and persistence.

It is important that the domain logic has a single, homogenous handling of time intervals. This can be implemented directly by explicitly using only a single representation of time intervals. Or it can be implemented indirectly by means of using one of the more advanced libraries using various representations of time intervals and where conversions are used and performed without ever exposing these to the actual domain logic.

Representation

Most IT systems are administrative in nature, where time intervals are related to human time. The quick conclusion is to agree upon that when time intervals are represented, then it is sufficient to have endpoints expressed with an accuracy of seconds and perhaps even with millisecond accuracy [1].

This means that where endpoints are shown as part of a presentation and the endpoints are chosen throughout the day –


[January 15th 2011; January 25th 2011]

– it is most immediate to have the domain logic settle on a representation with a finer granularity as –


[2011-01-15 00:00:00.000; 2011-01-25 23:59:59.999]

– which has a precision including a resolution of milliseconds. This must be perfect, right?

Man's time after the clock and the sun is continuous. A digital machine time is discrete and is measured with the representation above in whole milliseconds. If man and the machine want to represent two time intervals which meet at the change between January the 25th and January the 26th, then it must be represented as –


[...; 2011-01-25 23:59:59.999], [2011-01-26 00:00:00.000; ...]

Here, the granularity in milliseconds is expressed by the fact that the value for the end-point in the first time interval is not strictly equal to the value for the starting point of the second time interval. To construct these intervals, it is required to handle the finer granularity explicitly by subtracting a millisecond. This handling is explicitly expressed in the domain logic of commonly occurring situations where the validity is measured at a much coarser granularity like whole days or whole hours.

What also tend to happen is that the time interval is saved in a database that happens to have a granularity of seconds. The interval like the above having a slightly finer granularity including milliseconds is persisted at a coarser granularity without the last fraction –


[...; 2011-01-25 23:59:59], [2011-01-26 00:00:00; ...]

– and then, later, when the interval is read again, then the interval without a cumbersome management becomes something like –


[...; 2011-01-25 23:59:59.000], [2011-01-26 00:00:00.000; ...]

The initial interval endpoint was 2011-01-25 23:59:59.999 but became 2011-01-25 23:59:59.000, which is not the same value.

So a presentation can count on throughout the day while the domain logic may have a homogeneous representation in milliseconds and the relationship between milliseconds and seconds are not handled properly. This is going to occur even if only a coarser resolution like whole days is used and without using hours or minutes at all. This is not good.

For both to get the continuous to meet with the discreet and to avoid being forced into lengthy conversions to grassland, and without the intervening granularities in hours and minutes are used, it is advantageous to use time intervals which are open at the end point. The time intervals are then represented as –


[...; 2011-01-26 00:00:00.000[, [2011-01-26 00:00:00.000; ...[

– and should the intervals happen to be persisted without milliseconds as –


[...; 2011-01-26 00:00:00[, [2011-01-26 00:00:00; ...[

– then they can be read again by simply resetting the not persisted fraction containing milliseconds to 000 –


[...; 2011-01-26 00:00:00.000[, [2011-01-26 00:00:00.000; ...[

– so that what was written is equal to what is being read. As long as the presentation only shows days, hours, minutes and seconds, there are no distinctions between the continuous and the discrete. The action required to make a new time interval that starts where the previous interval ended, becomes a very trivial assignment of endpoints.

Whether to use open or closed intervals is absolutely not trivial.

Conclusion

To summarise, the overall conclusion is this:


In general IT systems of an administrative nature, it is advantageous to represent the validity of data by time as intervals where the granularity is milliseconds and the interval is half open and as indicated by the form of this example:

[2011-01-01 00:00:00.000; 2011-02-01 00:00:00.000[

Regardless of the details that lead to this conclusion, there are signs that others have reached the same result. One can look at libraries, where many libraries represent times and dates in many guises, but only few have representations of time intervals. One of the better libraries is Joda Time [2]. When looking closely at this library, it is worth noticing that the definition of a time interval [3] is documented as follows:


...
A time interval represents a period of time between two instants. Intervals are inclusive of the start instant and exclusive of the end.
...
Intervals have a fixed millisecond duration.
...

How the authors of this very rich library have come to the conclusion, that time intervals should have only a single representation, one can guess.

Regarding the handling of time intervals, I have my own conclusions. Handling time intervals is not a trivial task, there is a most correct way to handle time intervals, and it is best to handle time intervals correctly within an implementation process from the start on.

Morten Sabroe Mortensen


[1] Alternative conclusions apply when the systems in question are live, real-time systems where scheduling is a vital topic. In this case, the timeline is not man's time, but a CPU's global beats. This beat can be read accurately for a CPU running at 3 Ghz, it is possible to use the operating system to access the global CPU counter and which are incremented 3*10^6 = 3,000,000 times per second. Here it is of relevance to measure time in microseconds and nanoseconds. This precision is far beyond the capabilities of a typical operating system and its interrupt-driven clock, because each update usually drifts in multiple milliseconds and depending upon internal schedulation. The premises are different and a conclusion about what is good enough, is another.

[2] See http://joda-time.sourceforge.net/.

[3] See http://joda-time.sourceforge.net/api-release/org/joda/time/ReadableInterval.html.