"Squished" Identities - Multiple Identities Aliased Together

Identify Squishing

Identity Squishing is a general term that we use when referring to situations where too many Kissmetrics “people” have been merged into a single “person.” You might see a profile listing hundreds of different values for “Customer ID”, or a single strange Customer ID (such as “Anonymous”) recording an unreasonably huge number of events. There are two different types of Identity Squishing (“Identify Squishing” vs “Alias Squishing”), and they are dealt with in different ways.

For Identify Squishing, there’s usually an erroneous line of code identifying multiple users with the same identity. This can happen if you set up code to identify all of your anonymous users as “anonymous,” or “[email protected],” or if you’re trying to identify users using a variable that has some sort of default value (like “none,” “missing,” “null,” etc).

The end result is that many of your users are identified as this bad value, and they’re all recording events under a single profile, showing up as a single user in all of your reports.

_kmq.push([‘identify’, ‘anonymous’]);

Unfortunately, there’s no good way to correct the data that has already been recorded in this case. All of the data under this profile has been tied to, in this example, “anonymous,” and there’s not a way to separate traffic recorded by different users.

On the other hand, the fix is relatively simple: when a user visits your site, you simply need to check on the user’s identity. If it is “anonymous,” toss that identity out. The JavaScript library will assign a new one for you.

function getCookie(cname) {
    var name = cname + "=";
    var ca = document.cookie.split(';');
    for(var i = 0; i <ca.length; i++) {
        var c = ca[i];
        while (c.charAt(0)==' ') {
            c = c.substring(1);
        }
        if (c.indexOf(name) == 0) {
            return c.substring(name.length,c.length);
        }
    }
    return "";
}
//"value" should be the ID you're filtering for; "domain" should be your site's domain (such as .example.com)
function deleteKMCookies(value, domain){
    if (getCookie('km_ni').indexOf(encodeURIComponent(value)) >= 0) { 
        document.cookie = "km_ni=; expires=Thu, 01 Jan 1970 00:00:00 UTC; domain="+domain+"; path=/";
        document.cookie = "km_ai=; expires=Thu, 01 Jan 1970 00:00:00 UTC; domain="+domain+"; path=/";
        document.cookie = "km_abi=; expires=Thu, 01 Jan 1970 00:00:00 UTC; domain="+domain+"; path=/";
    }
}
deleteKMCookies('mybadpattern', '.mydomain.com');

This example is missing some fairly important best practices, such as ensuring that “[email protected]” doesn’t get their identity cleared, but it would work in a pinch.

Alias Squishing

Alias Squishing, despite feeling and looking similar, is a largely different beast from Identity Squishing. In this case, it’s an Alias call with an erroneous value, not an identify call. This can happen if you accidentally alias two different users together or, more commonly, when you alias multiple users to the same unrelated value, like “default,” “0,” or “anonymous,” potentially due to a default value or just a simple coding mistake.

You might, for example, see something like the following - the site owner is trying to alias the current user to the value stored in the variable named email, but ends up aliasing to the word ‘email’ instead.

_kmq.push([‘alias’, ‘email’]);

This would cause the user’s current identity, say “userA,” and the word “email” to become tied together. Any data recorded by either “userA” or “email” would show up in the same profile. If this happens to just one user, it isn’t a problem. But if userB also gets aliased to “email,” suddenly there’s big trouble - data recorded by “userA” and “userB” is suddenly showing up under the same profile in Kissmetrics. The two users are now considered a single user in all of our reports. Now, imagine that this has happened for every user on your site.

As far as fixing this issue, there are a couple of paths you can take, but they all start in the same place. We’ll cover the basic “put out the fire” steps first:

1. Delete or correct the code responsible. This is the most important step; if this doesn’t happen, all of the following steps are rendered moot, since all of your users will end up squished together again in time.
2. Test and make sure that the code problem is fixed. Make sure you have all of your edge-cases covered.
3. Create a new Kissmetrics product in your account. If you don’t start tracking under a new product, all of your existing users will maintain their squished IDs, and your data will remain inaccurate.
4. Transfer tracking over to the new product. Simply replace the existing API key in your code with the new API key generated by the new product, found in that product’s settings.

These are the basic, initial steps to recovery. If you don’t care about recovering the old data and this is where you want to stop, that’s fine - you can stop reading here. However, unlike Identify squishing, there are steps you can take to correct the data inaccuracies caused by Alias squishing, with various degrees of accuracy. In any case, the next step is:

5. Set up a data export to S3. We will need to move all of your data to an external location to modify and re-import it.
    5a. Move that data to the root of the bucket - we’ll need it there for the re-import process.

From here, you have two choices, more or less. Our “Recurring JSON Import” tool, which imports JSON data from an S3 bucket, provides a “filtering” option that allows you to exclude lines that match a certain word or string. The quickest and easiest shotgun solution is to exclude all lines containing the word “alias,” which will have some repercussions on your data - for instance, you probably won’t be able to tie purchases back to ad campaigns in most cases.

On the other hand, if a more accurate pattern can be found - for instance, if all of these bad aliases contain “anonymous” as a value, you could instead filter the word “anonymous” to exclude just the bad alias calls out, leaving you with, more or less, perfectly accurate data.

  1. Set up a JSON import for the corrected data - specifically the numerical recurring import.

🚧

Note:

For larger accounts, the numerical import can take a very long time - even several months, depending on the size of the account.

If you have a larger (or older) account, you may want to use the Ruby gem at the bottom of this support article to merge all of the files into a single JSON file, which will expedite the upload process significantly. If you do so, use the “recurring json import” option instead of the “numerical” one.