Encoding Numbers

One of the most common data types to encode is numbers. This could be a numeric value of any kind – 82 degrees, $145.00 in sales, 34% of capacity, etc. The sections below describe encoders for a single numeric value. In the sections below, we will introduce several strategies for encoding scalar values into binary arrays.

A Simple Encoder for Numbers

In the simplest approach, we can mimic how the cochlea encodes frequency. The cochlea has hair cells that respond to different but overlapping frequency ranges. A specific frequency will stimulate some set of these cells. We can mimic this process by encoding overlapping ranges of real values as active bits. Let’s say we need to represent a range of values from 0 - 55, and we have 100 bits to represent this space.

This is like mapping a continuous value range of 0-55 onto a binary domain of 0-100 bits. Each scalar value has a corresponding bit associated with it, which we can find by scaling from one domain to another. Once we know the index of the bit array, we can expand the representation by adding more bits on either side of the current index. For simplicity, we’re going to offset the actual linear scaling operation to a third party library called “d3”.

So, given constant values for w, n, min, and max, we can write the JavaScript code for this encoder like this:

let d3 = require('d3')

const n = 100
const w = 18
const min = 0
const max = 55

let scale = d3.scaleLinear()
    .domain([min, max])
    .range([0, n])
let reverseScale = d3.scaleLinear()
    .domain([0, n])
    .range([min, max])

function applyBitmaskAtIndex(index) {
    let out = [],
        lowerValue = reverseScale(index - (w/2)),
        upperValue = reverseScale(index + (w/2))

    // For each bit in the encoding, we get the input domain
    // value. Using w, we know how wide the bitmask should
    // be, so we use the reverse scales to define the size
    // of the bitmask. If this index is within the value
    // range, we turn it on.
    for (let i = 0; i < n; i++) {
        let bitValue = reverseScale(i),
            bitOut = 0
        if (lowerValue <= bitValue && bitValue < upperValue) {
            bitOut = 1
        }
        out.push(bitOut)
    }
    return out
}
// Accepts a scalar value within the input domain, returns an
// array of bits representing the value.
function encode(value) {
    // Using the scale, get the corresponding integer
    // index for this value
    let index = Math.floor(scale(value))
    if (index > n - 1) {
        index = n - 1
    }
    return applyBitmaskAtIndex(index)
}
Code Example 1: Given constant values for the input value range and output parameters, a complete scalar encoder.

So, given that value  is a scalar number between 0 and 55, the encode() function above will create an encoding 100 bits long with 18 bits on (the bitmask) to represent that specific value. Calling encode(27.5) would return a 100-element array, with a bitmask, or block of 1s, in the middle:

[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
Encoding Example 1: The encoding for the scalar value 27.5. Shows an 18-bit long bitmask.

Using only the code shown above, we can create an interactive visualization of this encoder. If you hover over the “scalar value” axis in the Figure 1 below, the red line moves and the current value being encoded changes. As the value changes, the encoding beneath it also changes, showing which bits are on (the blue ones) vs off. Also hover your mouse over the rectangles representing bits in the output encoding and see the range within the scalar input domain that activates that bit.

Figure 1: A value between 0 and 55 is encoded into bits above. Move your mouse over the number line to see the encoding update. Hover over the bits in the encoding to see the value range each bit can represent.

Notice how the range of the on bits in the encoding always encompass the currently selected value. As you move towards the min and max values, you might notice there is a problem with this representation. If you increase the resolution and move the value, you can clearly see the number of bits in the representation decreasing by half as you approach the min and max values (watch the figure above as you click here). Did you notice anything? Click again and pay attention to the size of the encoding. It changes as the value moves toward the edge, and that breaks one of our major rules established earlier. Principle #4 of encoders states:

The output should have similar sparsity for all inputs and have enough one-bits to handle noise and subsampling.

Our super simple encoder detailed above is going to need a little more logic to handle the literal “edge cases” of minimum and maximum representations. We can do this by overriding the applyBitmaskAtIndex() function above with another one that accounts for this new behavior we want:

function applyBoundedBitmaskAtIndex(index) {
    let out = [],
        lowerBuffer = reverseScale(w),
        upperBuffer = reverseScale(n - w),
        lowerValue = reverseScale(index - (w/2)),
        upperValue = reverseScale(index + (w/2))

    // For each bit in the encoding, we get the input domain 
    // value. Using w, we know how wide the bitmask should be,
    // so we use the reverse scales to define the size of the
    // bitmask. If this index is within the value range, we 
    // turnit on.
    for (let i = 0; i < n; i++) {
        let bitValue = reverseScale(i),
            bitOut = 0

        if (lowerValue <= bitValue && bitValue < upperValue) {
            bitOut = 1
        }

        // Keeps the output width from changing size at 
        // min/max values
        if (lowerValue < min && bitValue < lowerBuffer) {
            bitOut = 1
        }
        if (upperValue > max && bitValue >= upperBuffer) {
            bitOut = 1
        }
        out.push(bitOut)
    }
    return out
}
Code Example 2: The only difference between the two encoders is highlighted above. We simply prevent the bitmask from getting smaller with some custom handling at the min and max.

Now when you hover near the min and max values, you’ll see that the size of the representation remains consistent. You might also notice that some bits will now represent more values than others.

Figure 2: Unlike the encodings in Figure 1, the size of all output encodings in this example will be the same because we have manually bounded the edges to force the representation to have a constant sparsity.

Complete Code Reference

See the complete ScalarEncoder and BoundedScalarEncoder JavaScript classes used in these examples. As an example, the following configuration produces the behavior visualized below.

let encoder = new BoundedScalarEncoder({
    min: 0, max: 50,
    w: 10, n: 100
})
function onNewValue(value) {
    let encoding = encoder.encode(value)
}
Code Example 3:An example of the creation of an encoder and its usage.
Figure 3: The behavior of a bounded encoder with a continuous input range of 0-50 into a bit range of 10 on bits in a 100-bit array.

Output Parameters

Encoders should give users control over the size and sparsity of encoders they create. Given constant values for the input range of 0-100, change the w and n values in the visualization below and observe how the output encoding changes.

Figure 4: This visual allows you to change the number of bits in the entire encoding (n) and the number of bits on (w).

Encoding by min / max

If you know the input domain for an encoder will remain constant, the easiest way to create an encoder is by defining a minimum and maximum input range. Once an encoder is created, these values cannot be changed or else encodings will be inconsistent. To see what an encoder configuration by min/max values might be like, change the min and max values in the panel below.

Figure 5: Define the input range with min and max.

Encoding by bit resolution

It might make more sense to create an encoder based upon the range of values each bit in the output array can represent. That is what we mean by resolution, the range of input values one bit represents in the output space.

Figure 6: The resolution is the range of values that one output bit represents.

The higher the resolution, the larger the input range. This makes sense when you think about each bit containing a larger range of values. For this example, I simply hard-coded the min value to be zero and updated the encoder’s max based upon the resolution value.

Cyclic Encoding

Remember above when we had to deal with the special cases of values being encoded at the beginning and end of the value range so all representations were the same size in the output array? Another way to deal with that is by assuming the entire output array is a continuous space — that it wraps around from the end back to the beginning. We can do this simply by changing how the bitmask around the target index is created:

applyBitmaskAtIndex(encoding, index) {
    for (let i = 1; i < w; i++) {
        let bitIndex = index + i
        // Adjust for out of range, by cycling around
        if (bitIndex >= n) bitIndex = bitIndex - n
        encoding[bitIndex] = 1
    }
    return encoding
}
Code Example 4: The only code we need to override is the code that applies the bitmask at the specified index.

Wow, our CyclicScalarEncoder is the simplest one so far! As a value starts to approach the end of the encoding space, bits in the beginning of the array will activate and the value will loop through the array as a value changes. In the figure below, mouse over the line towards the max value and watch as the bits wrap to the beginning of the array.

Figure 7: Because the block of on bits wraps as you approach the end of this array, it is natural to view this as a circle by choosing the `circle` display option above.

Change the display option in the visualization above to circle. When viewed in this way, the wrapping of the output bit makes more sense. Change the value being encoded by mousing over the value line above and observe the encoding.

We’ll be taking strong advantage of the simplicity of cyclic encoding when we talk about encoding periods of time, as well as in the next section, when we talk about category encoding.

Discrete Vs Continuous

All the examples shown so far have been of continuous encodings, because all our input ranges have been a continuous scale of numeric values. This also means that numbers near one another on the number line have been represented similarly. For example, in Figure 8 below you can see encodings for two numbers: 4 and 5. Given the encoding parameters defined below, you can see the overlapping bits in green.

Figure 8: The encoding of 4 is in blue and 5 is in yellow. The overlapping bits are green.

This overlap means that these to values are semantically similar. They are represented similarly because their values are close on the number line. Compare the overlap of two close values vs two values farther from each other. These two values are far enough away from each other on the number line that they have no semantic similarity. Try changing the values of w and n while noticing how it affects the overlap between close and far values.

Continuous encoding is great for ranges of input values, but sometimes we don’t want encodings to overlap. You might want to separate an encoding space into equal sections that encode different categories of data, like this:

Figure 9: By limiting the input to discrete values and making n an even multiple of w, it is easy to encode discrete scalar values with a CyclicScalarEncoder.

Accomplishing this is really quite simple if we use logic we’ve already defined for the CyclicScalarEncoder. If we know how many different discrete values we need to encode and the number of bits to use for each, we can do this:

let values = [0,1,2,3,4]
let w = 5
let encoder = new CyclicScalarEncoder({
    w: w,
    n: values.length * w,
    min: 0,
    max: values.length
})
Code Example 5:Use the CyclicScalarEncoder.

This CyclicScalarEncoder is now configured to return discrete encodings for the discrete input values [0,1,2,3,4], but remember we have to send it integers, not decimal values in order to get back non-continuous encodings. Watch us turn this little code snippet into a CategoryEncoder in the next section.


Next: Encoding Categories