Contents
Background
In my previous post I described a simple Scala server using NIO and continuations, and mentioned in the Limitations section that the example did not convert the data bytes to characters. In this post I show how that can easily be added by using another feature of the Java NIO package: character-set encoders and decoders.Java NIO Character Coders
Thejava.nio.charset
package includes a
Charset
class that represents a mapping between the 16-bit Unicode
code-units that Java uses for its internal representation for
characters and strings,
and a sequence of bytes as are stored in a file or transmitted
through a socket connection.
Each such mapping is represented by a separate instance of the
Charset
class.
Standard character mappings such as "UTF-8" and "ISO-8859-1"
can be retrieved using the static
forName
method.
Given an instance of
Charset
,
a
CharsetEncoder
for that character mapping can
be retrieved by calling the
newEncoder
method on that instance.
That encoder can then be used to convert a Java string into a sequence
of bytes suitable for writing to a file or connection.
Similarly, the
newDecoder
method on Charset
retrieves a
CharsetDecoder
that can be used for the complementary task of
converting bytes from a file or connection into a Java string.
The encoding and decoding methods convert data between a
CharBuffer
and a
ByteBuffer
.
Since the java.nio
socket I/O calls we are using read and write
their data to and from ByteBuffer
s,
it is convenient for the encoding and decoding to use those objects.
LineDecoder
Using thejava.nio.charset
classes described above,
we write a LineDecoder
class containing a processBytes
method that takes as input a
ByteBuffer
(which is what we have to read into when using a
SocketChannel
)
and converts that byte data to Java characters.
For this example, we also break up that character data into separate lines
when we see line break characters,
converting each line of characters to a Java String
.
One buffer of data might contain multiple lines of character data,
so rather than returning a set of lines,
our method accepts a callback to which we pass each line
as we decode it.
import java.nio.{ByteBuffer,CharBuffer} import java.nio.charset.{Charset,CharsetDecoder,CharsetEncoder,CoderResult} import scala.annotation.tailrec class LineDecoder { //Encoders and decoders are not multi-thread safe, so create one //for each connection in case we are using multiple threads. val utf8Charset = Charset.forName("UTF-8") val utf8Encoder = utf8Charset.newEncoder val utf8Decoder = utf8Charset.newDecoder def processBytes(b:ByteBuffer, lineHandler:(String)=>Unit):Unit = processChars(utf8Decoder.decode(b),lineHandler) @tailrec private def processChars(cb:CharBuffer, lineHandler:(String)=>Unit) { val len = lengthOfFirstLine(cb) if (len>=0) { val ca = new Array[Char](len) cb.get(ca,0,len) eatLineEnding(cb) val line = new String(ca) lineHandler(line) processChars(cb, lineHandler) //handle multiple lines } } //Assuming the first character in the buffer is an eol char, //consume it and a possible matching CR or LF in case the EOL is 2 chars. private def eatLineEnding(cb:CharBuffer) { //Eat the first character and see what it is cb.get match { case '\n' => if (cb.remaining>0 && cb.charAt(0)=='\r') cb.get case '\r' => if (cb.remaining>0 && cb.charAt(0)=='\n') cb.get case _ => //ignore everything else } } private def lengthOfFirstLine(cb:CharBuffer):Int = { (0 until cb.remaining) find { i => List('\n','\r').indexOf(cb.charAt(i))>=0 } getOrElse -1 } }Here is an imperative version of
lengthOfFirstLine
that does the same thing as the functional version above.
private def lengthOfFirstLine(cb:CharBuffer):Int = { var cbLen = cb.remaining for (i <- 0 until cbLen) { val ch = cb.charAt(i) if (ch == '\n' || ch == '\r') return i } return -1 }
NioConnection
One of the classes shown in my previous post was the NioConnection class, whose responsibilities include processing input data from the client. It does this in the methodreadAction
,
which initially looks like this:
//The old version private def readAction(b:ByteBuffer) { b.flip() socket.write(b) b.clear() }We replace the direct call to
socket.write
with a call to LineDecoder.processBytes
,
which is responsible for decoding the input data,
and we pass it our new writeLine
method
that accepts a line of characters
and writes it back to the client.
Also, we don't actually need the call to b.clear
here,
which is effectively at the bottom of our readWhile
loop,
since we call that method at the top of the loop.
private val lineDecoder = new LineDecoder private def readAction(b:ByteBuffer) { b.flip() lineDecoder.processBytes(b, writeLine) } def writeLine(line:String) { socket.write(ByteBuffer.wrap((line+"\n").getBytes("UTF-8"))) }Now when we receive some input data, it gets passed to
LineDecoder.processBytes
,
which converts it to characters, breaks it up into separate lines,
and calls our writeLine
method for each line.
The writeLine
method uses
String.getBytes
to convert the characters in the line back to bytes,
wraps those bytes into a ByteBuffer
and writes them directly to the output channel.
As compared to the example in the previous post, this example should behave the same externally, but we are now passing around Java strings rather than NIO buffers, which, assuming we want to deal with string data rather than binary data, will make it simpler to write the rest of the real application.
Limitations
- As with the example in the previous post,
the current example only shows how to use the NIO calls
on the read side of the connection.
We could use a
CharsetEncoder
on the write side rather than usingString.getBytes
andByteBuffer.wrap
. - Partial input lines (characters not terminated by an EOL character) are ignored by this implementation.
- The example uses the convenience method version of
decode
, which assumes that the inputByteBuffer
contains complete character sequences. It is possible that a multi-byte character sequence will be split such that only the first part of that sequence appears at the end of the input buffer, with the remainder of the sequence appearing at the start of the next buffer of input data. The above implementation will not properly handle this situation. The underlyingdecode
method does handle this situation properly, but the remaining code in this example is not set up for this situation. - The
decode
convenience method throws exceptions rather than returning a status code as the fulldecode
method does. Since these exceptions are nowhere caught in the code, such an exception would cause that task to abort. A more robust solution would have a mechanism to catch exceptions or restart an aborted task. - The example assumes UTF-8 encoding.
No comments:
Post a Comment