Looking for an elegant way to remove variables from HTTP metrics

We’re tracking http server metrics and need to have a view of latency for our services utilizing path as one of the dimensions
The trouble with “path” as a dimension is that it WILL have really high cardinality for 2 reasons.

  1. Random scanners requesting well known exploitable paths
  2. Resource IDs as variables in the paths

At scale #2 can be very problematic.

Luckily enough. The set of “good” paths is known so it’s relatively easy to filter out all the #1.
The plan here is to create a set of allowlisted regex matches and anything
that doesn’t match it gets replaced with the literal
path=$RandomScanner

This enables us to still have a view into how much junk is hitting us but we don’t care what they requested.

For #2 the top contender is to build a series of eval functions with matchers for all the known paths that
have variables in them and replace those variables with literals. Such as…

/api/v1/packs/MyCoolPack                       --> /api/v1/packs/$PackId
/api/v1/packs/YourCoolPack                     --> /api/v1/packs/$PackId
/api/v1/packs/NR_Test_Lookup/export            --> /api/v1/packs/$PackId/export
/api/v1/p/NR_Test_Lookup/system/samples/951MMu --> /api/v1/p/$PackId/system/samples/$SampleId
/api/v1/system/samples/UelALs/content          --> /api/v1/system/samples/$SampleId/content
...

It’s tedious and will take time to build while finding all the paths. It leaves me wondering if there’s a better Stream function than stringing multiple evals one after the other to get this done.

2 UpGoats

I think a series of Mask functions would do the trick.

\/api\/v1\/packs\/[^/]+ => '/api/v1/packs/$PackId'
\/samples\/[^/]+ => '/samples/$SampleId'
\/api\/v1\/system\/samples\/[^/]+/content => '/api/v1/system/samples/$SampleId/content'

2 UpGoats

I wrote some code that should accomplish you use case @dshanaghy and does not require regex. Define your known API paths in the tree variable defining your substitution as variables in the key names (e.g. $var). Please customize the code to your liking. The URL should be in a field called url and the function will output a new object in a field called url_parsed with any tokens in a separate child object.

// ['api', 'v1', 'packs', 'MyCoolPack']
const tokens = __e['url'].substring(1).split('/');

// Tree of API paths
// At a given level, you may only use static strings or variables (beginning with $)
// The lowest child object must end with an empty object {}
let tree = {
    "api": {
        "v1": {
            "p": {
                "$PackId": {
                    "system": {
                        "samples": {
                            "$SampleId": {}
                        }
                    }
                }
            },
            "packs": {
                "$PackId": {
                    "export": {}
                }
            },
            "system": {
                "samples": {
                    "$SampleId": {
                        "content": {}
                    }
                }
            }
        }
    }
}

// Recursive function -> call on each token from the first line
const test = (a, o, url, tokens) => {
    // Pop the left-most item and leave the rest in an array "rest"
    let [first, ...rest] = a;

    // List of keys at this depth in the object
    let keys = Object.keys(o);

    if (keys.includes(first)) {
        // found static match
        url = `${url}/${first}`;
    } else if (keys.length === 1 && keys[0].startsWith('$')) {
        // Token/variable replacement
        url = `${url}/${keys[0]}`
        // Set token KV pair
        tokens[keys[0].substring(1)] = first;
        // Reference the correct API object key, not the variable value
        first = keys[0];
    } else {
        // Something failed
    }

    if (rest.length > 0) {
        // There are more tokens to parse
        return test(rest, o[first], url, tokens);
    } else {
        // We're done!
        return {url, tokens};
    }
};

// First call to the recursive function setting initial values
__e['url_parsed'] = test(tokens, tree, '', {});

For example, /api/v1/p/NR_Test_Lookup/system/samples/951MMu would output:

{
  "url": "/api/v1/p/$PackId/system/samples/$SampleId",
  "tokens": {
    "PackId": "NR_Test_Lookup",
    "SampleId": "951MMu"
  }
}
1 UpGoat

That’s pretty slick @bdalpe. Before you posted I had barreled ahead and brute forced it with a series of evals. Given that they can all go in 1 function it’s more manageable than my first thought that it would require multiple functions. Will update to this and give it a shot.

1 UpGoat