The definition of Splunk
"Splunk is an American multinational corporation headquartered in San Francisco, California, which produces software for searching, monitoring, and analyzing machine-generated big data, via a web-style interface." | ||
--http://en.wikipedia.org/wiki/Splunk |
The company Splunk (which is a reference to cave exploration) was started in 2003 by Michael Baum, Rob Das, and Erik Swan, and was founded to pursue a disruptive new vision of making machine-generated data easily accessible, usable, and valuable to everyone.
Machine data (one of the fastest growing segments of big data) is defined as any information that is automatically created without human intervention. This data can be from a wide range of sources, including websites, servers, applications, networks, mobile devices, and so on, and can span multiple environments and can even be Cloud-based.
Splunk (the product) runs from both a standard command line as well as from an interface that is totally web-based (which means that no thick client application needs to be installed to access and use the tool) and performs large-scale, high-speed indexing on both historical and real-time data.
Splunk does not require a restore of any of the original data but stores a compressed copy of the original data (along with its indexing information), allowing you to delete or otherwise move (or remove) the original data. Splunk then utilizes this searchable repository from which it efficiently creates graphs, reports, alerts, dashboards, and detailed visualizations.
Splunk's main product is Splunk Enterprise, or simply Splunk, which was developed using C/C++ and Python for maximum performance and which utilizes its own Search Processing Language (SPL) for maximum functionality and efficiency.
The Splunk documentation describes SPL as follows:
"SPL is the search processing language designed by Splunk® for use with Splunk software. SPL encompasses all the search commands and their functions, arguments, and clauses. Its syntax was originally based upon the UNIX pipeline and SQL. The scope of SPL includes data searching, filtering, modification, manipulation, insertion, and deletion."
Keeping it simple
You can literally install Splunk—on a developer laptop or enterprise server and (almost) everything in between—in minutes using standard installers. It doesn't require any external packages and drops cleanly into its own directory (usually into c:\Program Files\Splunk
). Once it is installed, you can check out the readme—splunk.txt
—file (found in that folder) to verify the version number of the build you just installed and where to find the latest online documentation.
Note that at the time of writing this book, simply going to the website http://docs.splunk.com will provide you with more than enough documentation to get you started with any of the Splunk products, and all of the information is available to be read online or to be downloaded in the PDF format in order to print or read offline. In addition, it is a good idea to bookmark Splunk's Splexicon for further reference. Splexicon is a cool online portal of technical terms that are specific to Splunk, and all the definitions include links to related information from the Splunk documentation.
After installation, Splunk is ready to be used. There are no additional integration steps required for Splunk to handle data from particular products. To date, Splunk simply works on almost any kind of data or data source that you might have access to, but should you actually require some assistance, there is a Splunk professional services team that can answer your questions or even deliver specific integration services. This team has reported to have helped customers integrate with technologies such as Tivoli, Netcool, HP OpenView, BMC PATROL, and Nagios.
Single machine deployments of Splunk (where a single instance or the Splunk server handles everything, including data input, indexing, searching, reporting, and so on) are generally used for testing and evaluations. Even when Splunk is to serve a single group or department, it is far more common to distribute functionalities across multiple Splunk servers.
For example, you might have one or more Splunk instance(s) to read input/data, one or more for indexing, and others for searching and reporting. There are many more methodologies for determining the uses and number of Splunk instances implemented such as the following:
- Applicable purpose
- Type of data
- Specific activity focus
- Work team or group to serve
- Group a set of knowledge objects (note that the definition of knowledge objects can vary greatly and is the subject of multiple discussions throughout this book)
- Security
- Environmental uses (testing, developing, and production)
In an enterprise environment, Splunk doesn't have to be (and wouldn't be) deployed directly on a production server. For information's sake, if you do choose to install Splunk on a server to read local files or files from local data sources, the CPU and network footprints are typically the same as if you were tailing those same files and piping the output to Netcat (or reading from the same data sources). The Splunk server's memory footprint for just tailing files and forwarding them over the network can be less than 30 MB of the resident memory (to be complete; you should know that there are some installations based on expected usage, perhaps, which will require more resources).
In medium- to large-scale Splunk implementations, it is common to find multiple instances (or servers) of Splunk, perhaps grouped and categorized by a specific purpose or need (as mentioned earlier).
These different deployment configurations of Splunk can completely alter the look, feel, and behavior of that Splunk installation. These deployments or groups of configurations might be referred to as Splunk apps; however, one might have the opinion that Splunk apps have much more ready-to-use configurations than deployments that you have configured based on your requirements.