Filesystem Collector and Event Breaker Inconsistencies

I have created an event breaker rule that works in the knowledge area of cribl stream. But when I run a filesystem collection job to pick up the same file I used to create the event breaker, it does not work.

Event breaker rules in

2 UpGoats

Event breaker rules out

1 UpGoat

Filesystem collection results (event’s do not have correct fields/values)

1 UpGoat

I have a feeling your header line regex is matching the first #separator line. I can test it out but I think maybe you would want to change your header line to ^#[Ff] so we ignore the lines before the fields line.

Can you validate you’ve committed and deployed?

Dan, that does make sense, I’ll reconfigure that one and reply back. I have an example where that would be inconsistent if that’s the case.

This event breaker works, collects, and extracts correctly. Event though the first few lines match ^# as well.

Jon, yes I have. On multiple occasions, for each attempt. I had restarted the worker too just in case.

Yeah sorry I think the header line is actually excluding everything that starts with #. What do your first few events look like on the file import?

I want it to exclude the # lines since those are not events. The first real events are tab separated and the field names are in that field list. I tried changing the header line to “^#[Ff]” the event breaker preview completely fails.

Here’s the first few lines of the file. With my original settings, the import looks fine and field/value pairs look good. But when run with a filesystem collector, it fails. I’m going to try with the header line changes that you recommended.

#separator \x09
#set_separator	,
#empty_field	(empty)
#unset_field	-
#path	conn
#open	2021-11-12-12-45-00
#fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	id.vlan	id.vlan_inner	proto	service	duration	orig_bytes	resp_bytes	conn_state	local_orig	local_resp	missed_bytes	history	orig_pkts	orig_ip_bytes	resp_pkts	resp_ip_bytes	tunnel_parents	orig_cc	resp_cc	suri_ids	community_id
#types	time	string	addr	port	addr	port	int	int	enum	string	interval	count	count	string	bool	bool	count	string	count	count	count	count	set[string]	string	string	set[string]	string
2022-05-01 00:00:00.000012	CUaMDI3N3CtEwGXbX9	46210	443	4020	\N	tcp	\N	65.207075	0	6218	SHR	1	0	0	^hdf	0	0	9	6590	\N	US	US	\N	1:1nbEONdQpmuQtjlL3SSQbc28Wyo=
2022-05-01 00:00:00.000320	CAZzJv4QRVv5Yek7Oh	54935	53	4020	\N	udp	dns	\N	\N	\N	SHR	1	0	0	^d	0	0	1	156	\N	US	CN	\N	1:KJjQRZuB5bkT7+ebSf4FW7RJiL8=
2022-05-01 00:00:00.000432	CdRza81SzhESDDyhI9	58632	443	4020	\N	tcp	ssl	376.280685	1458	6534	S1	1	0	0	ShDd	3	1590	7	6826	\N	US	US	\N	1:ZqDFOlfGk/8wlEO1gmawxhE6YBg=
2022-05-01 00:00:00.001140	CAcMyE40njQ2DatMNc	59755	53	4020	\N	udp	dns	\N	\N	\N	S0	1	0	0	D	1	140	0	0	\N	US	US	\N	1:SeSWa3fEVB/I60glsRug0PmDPys=

Sorry, I mean what do the first few events look like in Stream when you run the job? You had a screenshot above but it started at event 7, just wondering how the first few events look.

Sorry, the first few events are the commented rows from the log. Exactly as they appear in the log.

I ran what you sent above through a collector with what I think is the same breaker as you and it worked.

The next thing I would check is what type of encoding you have on your file, is this pulling from a linux machine? If so run a file testfile.tsv on your test file.

The breaker works for me in the preview, but not when I run an actual collection.

sample.tsv: UTF-8 Unicode (with BOM) text, with CRLF line terminators

It is being pulled from an NFS mount (Synology NAS) on a linux (ubuntu 20.04) vm.

UTF-8 should be fine.

This is using the breaker I posted above and pulling with a file system collector.

The only difference is my filter. When I add a ‘filetype’ field on the collector and use bro with your filter it breaks it. Where are you adding filetype?

Which is strange since it is showing you are hitting the tsv-bro breaker.

How big is the file you are collecting?

The filetype is just a variable on the path for a filesystem. I have about 15 different “filetypes”… I’m hitting the right breaker as seen in my collection. I’m on cribl 3.4.1 for both leader and worker.

It’s 10,000 line sample, but when the file actually comes in its about 10-50 GB

Can you try bumping your max event size to the max 134217728 (128MB). How big are your working IIS logs?

You could also try to bump up the event breaker buffer timeout on the file system collector.