JSON formatting for VMWare Log Insight


I’ve added some fields (lookups, etc) from a syslog source and trying to push it into the Log Insight API via a webhook.

But, I’m struggling understand how to get it into the format Log Insight wants.
I’ll say up front, JSON and Javascript is pretty new to me so be gentle :slight_smile:

The JSON format Log Insight expects is…

{“messages”: [{
“fields”: [
{“name”: “Field1”, “content”: “Field1_value”},
{“name”: “Field2”, “content”: “Field2_value”},
{“name”: “Field_xxx”, “content”: “Field_Last_xxx”}
“text”: “original message”,
“timestamp”: timestamp

I’ve tried the Serialize to JSON, but it doesn’t seem to have a way to arrange the fields for it to output in a custom way.

I’ve seen a post about using Object.fromEntries, but I’m still not able to get working. Not sure if it’s even the right way to go.

It’s probably something simple, Thank you in advance!

Be sure your target object is actually JSON. If it’s text (represented by an ‘a’ in the preview pane), you’ll want to parse it into JSON before it hits the destination. Eval function, with _raw => JSON.parse(_raw) would work.

Note the curly braces next to raw here, indicating a JSON object:


Hi @BargiBargi,

Here’s how I would build a pipeline to format the messages. Here’s an example I’ll show with some fields already extracted. We want to move _raw to the text field, _time to timestamp, and then all remaining fields to fields.

Let’s start with a basic Eval function to build the general structure of an individual message:

(the expression is {"text": _raw, "timestamp": _time, "fields": []})

Now we can use the code function to do some magic… We want to take all fields (using the special variable __e) that do not start with an underscore (internal or otherwise not already used) and move their KV pairs to fields.

__e['_raw']['fields'] = Object.entries(__e)
.filter(([key, value]) => !key.startsWith('_'))
.map(([key, value]) => {
    return {"name": key, "content": value}

We use the Object.entries function to create a KV array to work with in the filter and map functions. In the map function, the return value reformats the original KV pair into the expected name and content fields.

Finally, we can use the Aggregations function to combine events into a single array.

The list(_raw) function will generate a new array of individual messages aggregated together. The evaluate fields expression moves the array into the expected messages object key.

Which gives an output that looks like the following:

Now in the Webhook destination, configure as follows to only emit the _raw field as the payload to the Log Insight collector. Note the URL is static for the destination, but it can be customized per-event by setting the __url field.

Let me know if this solves your issue!

Thanks so much for the help.

I’ve worked through it and I can see it builds the JSON object up as expected.
But when it’s sending to the LI API it’s it’s erroring bellow.

{“errorMessage”:“Invalid request body.”,“errorCode”:“JSON_FORMAT_ERROR”,“errorDetails”:{“reason”:“Unrecognized token ‘object’: was expecting (JSON String, Number, Array, Object or token ‘null’, ‘true’ or ‘false’)\n at [Source: (String)"[object Object]\n[object Object]\n[object Obj…”[truncated 3915 chars]; line: 1, column: 8]"}}

Pipeline looks fine (only thing maybe a bit strange is the content is listed before name, but assume that’s just Cribl sorting alphabetically)

I changed the code to filter on anything starting with NSXT as all the fields I need start with that.

Tried with the Aggregations and just as Eval and same

Live Data for the Webhook destination again looks fine

@BargiBargi could you try adding an eval function to the end of your pipeline that turns _raw into JSON.stringify(_raw)?

Using JSON.stringify(_raw) I can see events come through when using Aggregations step, which is good, but now I barely see 1 event a second coming through when there’s much more than that coming in.

I thought it might have been to do with the Aggregation so replaced it with the following Eval and it works, but again only at a very low rate

There’s no dropped events in the Cribl Webhook which is strange. So not quite sure wha’ts going on. Does the JSON.stringify have a rate limit to what it can process?


Been trying to sort this on and off and can’t figure out what the issue is.
Without the JSON.stringify(_raw) Cribl outputs [object Object], which can be seen in the last-failed-buffer.raw

With JSON.stringify(_raw) it works, but it’s literly 1 message every 2 seconds

Any advice would be much appreciated!

Got some time to look at this again. Hopefully it helps anyone, although still not sure I’m fixed.

As a test I tried Syslog and File as a Destination which both worked fine, even with lookups and other processing added. Cribl was able to output all incoming events fine.

I went back to the Pipeline and Enabled “Cumulative Aggregations”

This brought the throughput up from 1 to 100/sec which was better, but still way off the incoming event count.

Looking at the Aggregate code and the documentation for “list” there is a limit of 100 results.
Set this to unlimited (or 0) and proceeded to blow out the size of the message.

Between changing the Limit max value, Time Windows (.25sec) and destination max body size I’ve been able to see about 30-35k/sec messages.

So right now just need to ensure all the messages are going through as even at the lower rates it wasn’t saying anything was being dropped.