Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Splunk Best Practices

You're reading from   Splunk Best Practices Operational intelligent made simpler

Arrow left icon
Product type Paperback
Published in Sep 2016
Publisher Packt
ISBN-13 9781785281396
Length 244 pages
Edition 1st Edition
Tools
Arrow right icon
Author (1):
Arrow left icon
Travis Marlette Travis Marlette
Author Profile Icon Travis Marlette
Travis Marlette
Arrow right icon
View More author details
Toc

Unstructured data

The following screenshot is an example of what unstructured data looks like:

Unstructured data

These kinds of logs are much more complicated to bring value to, as all of the knowledge must be manually extracted by a Splunk engineer or admin. Splunk will look at your data and attempt to extract things that it believes is fields. However, this often ends up being nothing of what you or your users are wanting to use to add value to their dashboards.

That being the case, this is where one would need to speak to the developer/vendor of that specific software, and start asking some pointed questions.

In these kinds of logs, before we can start adding the proper value, there are some foundational elements that need to be correct. I'm only going to focus on the first, as we will get to the other 2 later in this book.

  • Time stamping
  • Event breaking
  • Event definitions
  • Field definitions (field mapping)

Event breaking - best practice

With structured data, Splunk will usually see the events and not automatically break them as they are nice and orderly.

With unstructured data, in order to make sure we are getting the data in appropriately, the events need to be in some sort of organized chaos, and this usually begins with breaking an event at the appropriate line/character in the log. There's lots of ways to break an event in Splunk (see http://docs.splunk.com/Documentation/Splunk/6.4.1/Admin/Propsconf and search for break), but using the preceding data, we are going to be looking at the timestamp to reference where we should break these events, as using the first field, which is most often the timestamp, is the most effective way to break an event.

There are a few questions to ask yourself when breaking events, though one of the more important questions is; are these events all in one line, or are there multiple lines in each event? If you don't know the answer to this question, ask the SME (dev/vendor). Things can get messy once data is in, so save yourself a bit of time by asking this question before inputting data.

In the following example, we can see the timestamp is the event delimiter and that there can be multiple lines in an event. This means that we need to break events pre-indexing:

Event breaking - best practice

In order to do that, we need to adjust our props.conf on our indexer. Doing so will appropriately delineate log events as noted in the following image:

Event breaking - best practice

Note

Adding line breaking to the indexing tier in this way is a method for pre-index event breaking and data cannot be removed without cleaning an index.

In this example, we have five indexers in a cluster pool, so using the UI on each of those indexers is not recommended. "Why?" you ask. In short, once you cluster your indexers, most of the files that would end up in $SPLUNK_HOME/etc/ having become shared, and they must be pushed as a bundle by the cluster master. It is also not recommended by Splunk support. Try it if you like, but be prepared for some long nights.

Currently Splunk is set up to do this quite easily from an individual file via the UI, though when dealing with a larger infrastructure and multiple indexers, the UI feature often isn't the best way to admin. As a tip, if you're an admin and you don't have a personal instance of Splunk installed on your workstation for just this purpose, install one. Testing the features you will implement is often the best practice of any system.

Best practices

Why should you install an instance of Splunk on your personal workstation you ask? Because if you bump into an issue where you need to index a dataset that you can't use the UI for, you can get a subset of the data in a file and attempt to ingest it into your personal instance while leveraging the UI and all its neat features. Then just copy all of the relevant settings to your indexers/cluster master. This is how you can do that:

  1. Get a subset of the data, the SME can copy and paste it in an e-mail, or send it attached or by any other way, just get the subset so you can try to input it. Save it to the machine that is running your personal Splunk instance.
  2. Login to your personal Splunk instance and attempt to input the data. In Splunk, go to Settings | Data Inputs | Files & Directories | New and select your file which should bring you to a screen that looks like this:

    Best practices

  3. Attempt to break your events using the UI.

Now we are going to let Splunk do most of the configuring here. We have three ways to do this:

  1. Auto: Let Splunk do the figuring.
  2. Every Line: this is self-explanatory.
  3. Regex...: use a REGEX to tell Splunk where each line starts.

For this example, I'm going to say we spoke to the developer and they actually did say that the timestamp was the event divider. It looks like Auto will do just fine, as Splunk naturally breaks events at timestamps:

Best practices

Going down the rest of the option, we can leave the timestamp extraction to Auto as well, because it's easily readable in the log.

The Advanced tab is for adding settings manually, but for this example and the information we have, we won't need to worry about it.

When we click the Next button we can set our source type, and we want to pay attention to the App portion of this, for the future. That is where the configuration we are building will be saved:

Best practices

Click Save and set all of the other values on the next couple of windows as well if you like. As this is your personal Splunk instance, it's not terribly important because you, the Splunk admin, are the only person who will see it.

When you're finished make sure your data looks like you expect it to in a search:

Best practices

And if you're happy with it (and let's say we are) we can then look at moving this configuration to our cluster.

Remember when I mentioned we should pay attention to the App? That's where the configuration that we want was written. At this point, it's pretty much just copying and pasting.

Configuration transfer - best practice

All of that was only to get Splunk to auto-generate the configuration that you need to break your data, so the next step is just transferring that configuration to a cluster.

You'll need two files for this. The props.conf that we just edited on your personal instance, and the props.conf on your cluster master. (For those of you unfamiliar, $SPLUNK_HOME/etc/master_apps/ on your cluster master)

This was the config that Splunk just wrote in my personal instance of Splunk:

Configuration transfer - best practice

Follow these steps to transfer the configuration:

  1. Go the destination app's props.conf, copy the configuration and paste it to your cluster masters props.conf, then distribute the configuration to its peers ($SPLUNK_HOME/etc/master_apps/props.conf). In the case of our example:
          Copy source file = 
          $SPLUNK_HOME/etc/apps/search/local/props.conf 
          Copy dest file = $SPLUNK_HOME/etc/master_apps/props.conf 
    
  2. Change the stanza to your source type in the cluster:
    • When we pasted our configuration into our cluster master, it looked like this:
                        [myUnstructured] 
                        DATETIME_CONFIG = 
                        NO_BINARY_CHECK = true 
                        category = Custom 
                        pulldown_type = true
    • Yet there is no myUnstructured source type in the production cluster. In order to make these changes take effect on your production source type, just adjust the name of the stanza. In our example we will say that the log snippet we received was from a web frontend, which is the name of our source type.
    • The change would look like this: 
                            [web_frontend] 
                            DATETIME_CONFIG = 
                            NO_BINARY_CHECK = true 
                            category = Custom 
                            pulldown_type = true 
      
  3. Push the cluster bundle via the UI on the cluster master:

    Configuration transfer - best practice

  4. Make sure your data looks the way you want it to:

    Configuration transfer - best practice

Once we have it coming into our index in some sort of reasonable chaos, we can begin extracting knowledge from events.

You have been reading a chapter from
Splunk Best Practices
Published in: Sep 2016
Publisher: Packt
ISBN-13: 9781785281396
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image