Thursday, February 28, 2013

Java TimeUnit Conversion

Java TimeUnit Conversion

When Java 5 (J2SE 1.5/5.0) was released in 2004 it included Doug Lea's Java concurrency library. This library includes functionality for e.g. scheduling threads at particular instants in time. Some corners of this uses an enumeration object named java.util.concurrent.TimeUnit and which continues to be present in Java SE 6, 7 and 8.

The time unit object represents nanoseconds, microseconds, milliseconds, seconds, minutes, hours and days. In the context of the concurrency library, a time duration is represented as an integer of type long and refers to one of the time units. Conversion of time duration values can be done easily by using the method TimeUnit.convert(long,TimeUnit).

This method has an annoying property: It does not handle rounding. If a time duration of 800 microseconds is converted to milliseconds, the result becomes 0 milliseconds. This is not always preferable. In some cases it would be nice to have rounding applied and to end up with a result of 1 millisecond. Rounding is not offered by the time unit object.

In general rounding is easy to implement. To convert hours to days using truncation is just an integer division with 24. To do the same with conversion, rounding can be done by first adding 12 hours and then dividing with 24 hours/day giving a result in days. This is exactly the same which happens, when rounding floating points values to integers: First add 0.5 and then truncate the fraction leaving only the integer part. Simple.

For the time unit object, this functionality seems to be hard to obtain. If the source time duration has a coarser granularity than the target time duration then it is possible to perform the conversion by using convert() directly:

long targetDuration=targetUnit.convert(sourceDuration,sourceUnit);

Within the implementation, a simple multiplication is applied.

If the source time duration has a finer granularity than the target time duration then TimeUnit.convert() will apply division and truncate. To get rounding, it is possible to convert 1 from the target unit space to the source unit space, and then add half of this to the source time duration before truncating:

long targetToSourceFactor=sourceUnit.convert(1,targetUnit);
long targetDuration=targetUnit.convert(sourceDuration+targetToSourceFactor/2,
                                       sourceUnit);

The very nice thing about this is that it works for all combinations of time units.

When put into a context of more complete code, the end result may be this:

public class TimeUnitUtility
{
  public static long convertWithRounding(long sourceDuration,
                                         TimeUnit sourceUnit,
                                         TimeUnit targetUnit)
  {
    long res=0;
    
    {
      if (sourceUnit==targetUnit)
      {
        res=sourceDuration;
      }
      else
      {
        if (sourceDuration<0)
        {
          res=-convertWithRounding(-sourceDuration,sourceUnit,targetUnit);
        }
        else
        {
          int order=targetUnit.compareTo(sourceUnit);
      
          if (order<=0)
          {
            res=targetUnit.convert(sourceDuration,sourceUnit);
          }
          else
          {
            long targetToSourceFactor=sourceUnit.convert(1,targetUnit);
            res=targetUnit.convert(sourceDuration+targetToSourceFactor/2,
                                   sourceUnit);
          }
        }
      }
    }
    
    return res;
  }
}

There exists another way to do the same thing. Instead of using compareTo() it is possible to convert the time duration to the fineste time unit and then convert this duration to the target duration. This involves three convert() operations:

public class TimeUnitUtility
{
  public static long convertWithRounding2(long sourceDuration,
                                          TimeUnit sourceUnit,
                                          TimeUnit targetUnit)
  {
    long res=0;
    
    {
      if (sourceUnit==targetUnit)
      {
        res=sourceDuration;
      }
      else
      {
        if (sourceDuration<0)
        {
          res=-convertWithRounding2(-sourceDuration,sourceUnit,targetUnit);
        }
        else
        {
          TimeUnit finestUnit=TimeUnit.NANOSECONDS;
          long finestDuration=finestUnit.convert(sourceDuration,sourceUnit);

          long targetToFinestFactor=finestUnit.convert(1,targetUnit);
          res=targetUnit.convert(finestDuration+targetToFinestFactor/2,
                                 finestUnit);
        }
      }
    }
    
    return res;
  }
}

This static implementation does not make it possible to address an abstract convert() method with two possible implementations - one performing truncation and the other performing rounding. One solution is to declare the abstract method in the form of an interface -

public interface TimeUnitConverter
{
  long convert(long sourceDuration,
               TimeUnit sourceUnit,
               TimeUnit targetUnit);
}

- and then implement two different implementations, which can be instantiated and passed around:

public class TruncateTimeUnitConverter implements TimeUnitConverter
{
  @Override
  public long convert(long sourceDuration,
                      TimeUnit sourceUnit,
                      TimeUnit targetUnit)
  {
    return targetUnit.convert(sourceDuration,sourceUnit);
  }
}
public class RoundTimeUnitConverter implements TimeUnitConverter
{
  @Override
  public long convert(long sourceDuration,
                      TimeUnit sourceUnit,
                      TimeUnit targetUnit)
  {
    return TimeUnitUtility.convertWithRounding(sourceDuration,sourceUnit,
                                               targetUnit);
  }
}

However, it is also possible to make the conversion parametrized and implement strict rounding as defined by java.math.RoundingMode. The set of rounding modes includes FLOOR, CEILING, UP, DOWN, HALF_UP, HALF_DOWN, HALF_EVEN and UNNECESSARY.

public class TimeUnitUtility
{
  ...
  public static long convert(long sourceDuration,
                             TimeUnit sourceUnit,
                             TimeUnit targetUnit,
                             RoundingMode roundingMode)
  {
    long res=0;
    
    {
      if (roundingMode==null)
      {
        String message=
          "Failure to convert; rounding mode must be set!";
        throw new IllegalArgumentException(message);
      }
      else
      {
        switch (roundingMode)
        {
          case FLOOR:
          {
            res=convert_FLOOR(sourceDuration,sourceUnit,targetUnit);
            break;
          }
        
          case CEILING:
          {
            res=convert_CEILING(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case UP:
          {
            res=convert_UP(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case DOWN:
          {
            res=convert_DOWN(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case HALF_UP:
          {
            res=convert_HALF_UP(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case HALF_DOWN:
          {
            res=convert_HALF_DOWN(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case HALF_EVEN:
          {
            res=convert_HALF_EVEN(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          case UNNECESSARY:
          {
            res=convert_UNNECESSARY(sourceDuration,sourceUnit,targetUnit);
            break;
          }
          
          default:
          {
            String message=
              "Failure to convert; rounding mode \""+
              roundingMode.name()+
              "\" can not be recognised!";
            throw new IllegalArgumentException(message);
          }
        }
      }
    }
    
    return res;
  }
  ...
}

For a complete context of these examples and including a test suite, please refer to the archive Yelstream-TimeUnitUtility_2013-02-28.zip. This archive contains an Eclipse Java project ready for import. All eight rounding modes are implemented for both positive and negative time durations and accompanied by unit test cases.

Wednesday, February 27, 2013

The Character Encoding Circus of XML-over-HTTP

The Character Encoding Circus of XML-over-HTTP

Welcome to the Circus

Transmitting data in the form of XML over the HTTP protocol in an ad hoc manner is simple. Just put the XML data into the body of a HTTP request, send the request, and grab the returned XML data from the body of the returned HTTP response. Nothing could be simpler - or so it seems. The fact is that the process is somewhat elaborate and contains intrinsic details which only few programmers get right.

When transmitting data over HTTP it is a good idea to state the nature of the content. This is done by setting the content-type header and containing a MIME type. The header may be something like -

Content-Type: text/plain; charset="UTF-8"

This header may also define the character encoding. When the character encoding is not set within the context of HTTP most textual values default to or must be ISO-8859-1. No matter which character encoding the content-type header specifies some things must always be encoding using ISO-8859-1. This is the case for HTTP header values in general. It is even the case for the Base64 encoded content "<username>:<password>" found within HTTP Basic and set by clients in the Authorization header - this is not dependent upon any encoding set in the content-type header and contrary to the public belief of programmers. This is due to the origin and legacy of the HTTP protocol. If the state of things was different and a choice was present, the encoding ISO-8859-1 would not always preferable over other encodings.

When the content-type does not specify a character encoding, different rules apply and depending upon the MIME type used. In one case the XML content must be interpreted as US-ASCII while in another case it must be interpreted binary - reading the optional byte order mark (BOM) and processing the encoding of the initial processing instruction -

<?xml version="1.0" encoding="UTF-16"?>

This implies that the "encoding" declaration in the processing instruction may be unnecessary and ignored.

Should it happen that some character encoding is not set right, then it may not pose a problem if both the client and the server agree upon how to do things. It is often the case that clients and servers are not coded by the same programmers or that the clients and servers do have to be compatible with existing software, and in this case things must be done eight. When the transmission of XML data over HTTP is not done right the character encoding circus is open and may imply expensive debugging sessions or result in middleware which does not quite cut it and which does not work in all intended use cases.

Specifications

When transmitting XML most professional programmers use the MIME type application/xml. There is another MIME type in common use - the type text/xml>. While both types are intended for the transmission of XML, they do have different interpretations.

Specifications do exist. The RFC 3023 does specify how five different media types for XML are supposed to be used. It is not easy to read but does contain strict answers.

Regarding the basic types text/xml and application/xml the RFC 3023 contains a section "3.6 Summary" with some clear statements. What applies to text/xml - among other things - is this:

  • Specifying the character encoding "charset" as part of the content-type is strongly recommended.
  • If the "charset" parameter of the content-type is not specified, then the default is "US-ASCII". The default of "ISO-8859-1" in HTTP is explicitly overridden.
  • An "encoding" declaration in the initial processing instruction of the XML, if present, is irrelevant.

It may well come as a surprise that the character encoding US-ASCII - and not the implicit encoding ISO-8859-1 of HTTP - is the default.

The summary also contains statements about the type application/xml:

  • Specifying the character encoding "charset" as part of the content-type is strongly recommended, and if present, it takes precedence.
  • If the "charset" parameter of the content-type is not specified, conforming XML processors must follow the requirements in section 4.3.3 of the Extensible Markup Language (XML) 1.0 specification ([XML]).

What this explains is that the content is either 1) to be read and written textually and using the "charset" of the content-type header, or 2) to be read and written binary using the "encoding" declaration in the initial processing instruction of the XML.

It is recommended to handle the XML textually by using the form -

Content-Type: application/xml; charset="UTF-8"

- containing a character encoding like "UTF-8" and in this case the "encoding" declaration in the initial processing instruction of the XML like -

<?xml version="1.0" encoding="ISO-8859-1"?>

- is to be ignored. To avoid confusion it would in fact look much better with -

<?xml version="1.0"?>

- since UTF-8 and ISO-8859-1 are not compatible or interchangeable.

The section 4.3.3 of [XML] states how to read XML content from a binary stream. Just like is done from a file. First the BOM is read together with the initial XML processing instruction and its "encoding" declaration, and then second the reader interprets the binary stream textually according to the byte order and the character encoding. Some of the details of how to do this can be read in another, quite interesting section F Autodetection of Character Encodings (Non-Normative) of [XML].

Data in the form of XML is to be handled differently when contained as text as opposed to a binary stream of octets.

Strange Behaviour and Incompatibility

Even though the specifications of text/xml and application/xml is more than a decade old - it is from the year 2001 and the infancy of the world of adopting SGML in its new incarnation of XML - the differences between the types is hardly common knowledge. These MIME types are far from the only ones used for XML, but are both in common use for many not-always-engenius and proprietary ways to communicate XML.

Complex matters tend to become a circus of strange behaviour and incompatibility only when present and existing specifications are not followed and adhered to.

Once the circus is open it includes some of the outmost unbelievable, unanticipated, hard-to-address, restrictive effects, and this is not even touched upon here by the example extravaganca present in the real world of application programming.

Should you ever hear statements about text/xml and application/xml being equal - or not in common use -, then you have acquired one of many tickets to the circus.