Encoding Categories

A Quick Review

So far we have created 3 encoders. You’ve seen some code snippets from each one in previous sections, but you might want to see them all laid out and complete before we move onward. So here are the three encoder classes in JavaScript:

Many datasets contain categorical information. In some cases the data consists of discrete, completely unrelated categories (such as the SKUs of products in a store). In other cases, the data consists of categories that may have some relation (such as days of a week). We will first walk through the discrete case. Some examples of this type of categorization:

  • weekend vs weekday
  • holiday vs non-holiday
  • part of speech for a word
  • animal kingdoms
  • Olympic sporting events
  • names of race horses
  • shapes

In these and many other cases it is useful to encode these characteristics as completely discrete categories. The encoding should attempt to minimize overlap between any of the category encodings. The easiest way to do this is dedicate some number of bits to each option. The encoding for any option has its dedicated bits active and the rest inactive. Here is an example of the weekday/weekend encoding for today:

Figure 1: An example of what a weekday / weekend encoding for today would look like. Hover over the value bar to change days.

In the previous section, we explained how you can create a CyclicScalarEncoder to encode discrete categories.

let values = [0,1,2,3,4]
let w = 5
let encoder = new CyclicScalarEncoder({
    w: w,
    n: values.length * w,
    min: 0,
    max: values.length
})
Code Example 1:How we hacked a CyclicScalarEncoder to create discrete encodings in the last section.

Now we’re going to convert this into an encoder that can encode any number of categories discretely into encodings of different sizes. We will do this by subclassing CyclicScalarEncoder :

class CategoryEncoder extends CyclicEncoder {

    constructor(opts) {
        super({
            w: opts.w,
            n: opts.categories.length * opts.w,
            min: 0,
            max: opts.categories.length,
        })
        this.categories = opts.categories
    }

    encode(value) {
        let index = this.categories.indexOf(value)
        return super.encode(index)
    }

}
Code Example 2:A real CategoryEncoder class, using the same principles.

How would we use this new encoder? Like this:

let encoder = new CategoryEncoder({
    w: 3,
    categories: ['one', 'two', 'three']
})

let encoding = encoder.encode('two')
Code Example 3:Using our new CategoryEncoder.

Now encoding should be [000111000].

Our new CategoryEncoder can be used as a base class for lots of different types of discrete encoders. Here’s one that accepts a JavaScript data value and produces a binary array based upon whether the date is a weekend or not.

class WeekendEncoder extends CategoryEncoder {

    constructor(opts) {
        super({
            w: opts.w,
            categories: ['weekday', 'weekend'],
        })
    }

    encode(date) {
        let dayOfWeek = date.getDay()
        let value = 'weekday'
        if (dayOfWeek === 0 || dayOfWeek === 6)
            value ='weekend'
        return super.encode(value)
    }

}

// Usage

let encoder = new WeekendEncoder({w: 3})
let encoding = encoder.encoder(new Date())
Code Example 4:By extending the CategoryEncoder class, we can create new encoders that operate on other data types, like this WeekendEncoder.

Creating new discrete category encoders is simple and easy. The one displayed in the code above is the same one running behind Figure 1 above.

When to use Discrete Encodings

Use discrete encodings then there are no semantic similarities between categories. For example: object types, SKUs, names, keys, literal categories, discrete groupings, animal kingdoms, job types, etc. Just because someone has the same first name as another person does not necessarily mean they should share semantics as when they are encoded. Human first names could be encoded discretely like so:

Figure 2: The most popular US baby names over last century as discrete categories.

Limitations of Discrete Categories

TODO: Explain what it means to have too many categories

– as categories grow the size of the semantic encoding gets smaller
– how does this affet the SP’s activations?

Figure 3: Example of different category lengths and how it affects initial connections from the SP.

Next: Encoding Time