This article is written by Douglas Berdeaux, the author of Penetration Testing with Perl. Open source intelligence (OSINT) refers to intelligence gathering from open and public sources. These sources include search engines, the client target's web-accessible software or sites, social media sites and forums, Internet routing and naming authorities, public information sites, and more. If done properly and thoroughly, the practice of OSINT can prove to be useful to strengthen social engineering and remote exploitation attacks on our client target as we search for ways to gain access to their systems and buildings during a penetration test.
In this article, we will cover how to gather the information listed using Perl:
To gather this data, we rely heavily on the LWP::UserAgent Perl module. We will also discover how to use this module with a secured socket layer SSL/TLS (HTTPS) connection. In addition to this, we will learn about a few new Perl modules that are listed here:
Before we use Google for intelligence gathering, we should briefly touch upon using Google dorks, which we can use to refine and filter our Google searches. A Google dork is a string of special syntax that we pass to Google's request handler using the q= option. The dork can comprise operators and keywords separated by a colon and concatenated strings using a plus symbol + as a delimiter. Here is a list of simple Google dorks that we can use to narrow our Google searches:
This is just a small list and a complete guide of Google search operators that can be found on their support servers. A list of well-known exploited Google dorks for information gathering can be found in a Google hacker's database at http://www.exploit-db.com/google-dorks/.
Getting e-mail addresses from our target can be a rather hard task and can also mean gathering usernames used within the target's domain, remote management systems, databases, workstations, web applications, and much more. As we can imagine, gathering a username is 50 percent of the intrusion for target credential harvesting; the other 50 percent being the password information. So how do we gather e-mail addresses from a target? Well, there are several methods; the first we will look at will be simply using search engines to crawl the web for anything useful, including forum posts, social media, e-mail lists for support, web pages and mailto links, and anything else that was cached or found from ever-spidering search engines.
Automating queries to search engines is usually always best left to application programming interfaces (APIs). We might be able to query the search engine via a simple GET request, but this leaves enough room for error, and the search engine can potentially temporarily block our IP address or force us to validate our humanness using an image of words as it might assume that we are using a bot. Unfortunately, Google only offers a paid version of their general search API. They do, however, offer an API for a custom search, but this is restricted to specified domains. We want to be as thorough as possible and search as much of the web as we can, time permitting, when intelligence gathering. Let's go back to our LWP::UserAgent Perl module and make a simple request to Google, searching for any e-mail addresses and URLs from a given domain. The URLs are useful as they can be spidered to within our application if we feel inclined to extend the reach of our automated OSINT. In the following examples, we want to impersonate a browser as much as possible to not raise flags at Google by using automation. We accomplish this by using the LWP::UserAgent Perl module and spoofing a valid Firefox user agent:
#!/usr/bin/perl -w use strict; use LWP::UserAgent; use LWP::Protocol::https; my $usage = "Usage ./email_google.pl <domain>"; my $target = shift or die $usage; my $ua = LWP::UserAgent->new; my %emails = (); # unique my $url = 'https://www.google.com/search?num=100&start=0&hl=en&meta=&q=%40%22'.$target.'%22'; $ua->agent("Mozilla/5.0 (Windows; U; Windows NT 6.1 en-US; rv:1.9.2.18) Gecko/20110614 Firefox/3.6.18"); $ua->timeout(10); # setup a timeout $ua->show_progress(1); # display progress bar my $res = $ua->get($url); if($res->is_success){ my @urls = split(/url?q=/,$res->as_string); foreach my $gUrl (@urls){ # Google URLs next if($gUrl =~ m/(webcache.googleusercontent)/i or not $gUrl =~ m/^http/); $gUrl =~ s/&sa=U.*//; print $gUrl,"n"; } my @emails = $res->as_string =~ m/[a-z0-9_.-]+@/ig; foreach my $email (@emails){ if(not exists $emails{$email}){ print "Possible Email Match: ",$email,$target,"n"; $emails{$email} = 1; # hashes are faster } } } else{ die $res->status_line; }
The LWP::UserAgent module used in the previous code is not new to us. We did, however, add SSL support using the LWP::Protocol::https module. Our URL $url object is a simple Google search URL that anyone would browse to with a normal browser. The num= value pertains the returned results from Google in a single page, which we have set to 100.
To also act as a browser, we needed to set the user agent with the agent() method, which we did as a Mozilla browser. After this, we set a timeout and Boolean to show a simple progress bar. The rest is just simple Perl string manipulation and pattern matching. We use the regular expression url?q= to split the string returned by the as_string method from the $res object. Then, for each URL string, we use another regular expression, &sa=U.*, to remove excess analytic garbage that Google adds.
Then, we simply parse out all e-mail addresses found using the same method but different regexp. We stuff all matches into the @emails array and loop over them, displaying them to our screen if they don't exist in the $emails{} Perl hash.
Let's run this program against the weaknetlabs.com domain and analyze the output:
root@wnld960:~# perl email_google.pl weaknetlabs.com ** GET https://www.google.com/search?num=100&start=0&hl=en&meta=&q=%40%22weaknetlabs.com%22 ==> 200 OK (1s) http://weaknetlabs.com/ http://weaknetlabs.com/main/%3Fpage_id%3D479 … http://www.securitytube.net/video/2039 Possible Email Match: Douglas@weaknetlabs.com Possible Email Match: weaknetlabs@weaknetlabs.com root@wnld960:~#
This is the (trimmed) output when we run an automated Google search for an e-mail address from weaknetlabs.com.
Now, let's turn our attention to using social media sites such as Google+, LinkedIn, and Facebook to try to gather e-mail addresses using Perl. Social media sites can sometimes reflect information about an employee's attitude towards their employer, their status within the company, position, e-mail addresses, and more. All of this information is considered OSINT and can be useful when advancing our attacks.
We can also search plus.google.com for contact information from users belonging to our target. The following is the URL-encoded Google dork we will use to search the Google+ profiles for an employee of our target:
intitle%3A"About+-+Google%2B"+"Works+at+'.$target.'"+site%3Aplus.google.com
The URL-encoded symbols are as follows:
The plus symbol + is a special component of Google dork, as we mentioned in the previous section. The intitle keyword tells Google to display results whose HTML <title> tag contains the About – Google+ text. Then, we add the string (in quotations) "Works at " (notice the space at the end), and then the target name as the string object $target. The site keyword tells the Google search engine to only display results from the plus.google.com site. Let's implement this in our Perl program and see what results are returned for Google employees:
#!/usr/bin/perl -w use strict; use LWP::UserAgent; use LWP::Protocol::https; my $ua = LWP::UserAgent->new; my $usage = "Usage ./googleplus.pl <target name>"; my $target = shift or die $usage; $target =~ s/s/+/g; my $gUrl = 'https://www.google.com/search?safe=off&noj=1&sclient=psy-ab&q=intitle%3A"About+-+Google%2B"+"Works+at+' .$target.'"+site%3Aplus.google.com&oq=intitle%3A"About+-+Google%2B"+"Works+at+'.$target.'"+site%3Aplus.google.com'; $ua->agent("Mozilla/5.0 (Windows; U; Windows NT 6.1 en-US; rv:1.9.2.18) Gecko/20110614 Firefox/3.6.18"); $ua->timeout(10); # setup a timeout my $res = $ua->get($gUrl); if($res->is_success){ foreach my $string (split(/url?q=/,$res->as_string)){ next if($string =~ m/(webcache.googleusercontent)/i or not $string =~ m/^http/); $string =~ s/&sa=U.*//; print $string,"n"; } } else{ die $res->status_line; }
This Perl program is quite similar to our last search program. Now, let's run this to find possible Google employees. Since a target client company can have spaces in its name, we accommodate them by encoding them for Google as plus symbols:
root@wnld960:~# perl googleplus.pl google https://plus.google.com/%2BPaulWilcox/about https://plus.google.com/%2BNatalieVillalobos/about ... https://plus.google.com/%2BAndrewGerrand/about root@wnld960:~#
The preceding (trimmed) output proves that our Perl script works as we browse to the returned results. These two Google search scripts provided us with some great information quickly. Let's move on to another example, not using Google but LinkedIn, a social media site for professionals.
LinkedIn can provide us with the contact information and IT skill levels of our client target during a penetration test. Here, we will focus on the contact information. By now, we should feel very comfortable making any type of web request using LWP::UserAgent and parsing its output for intelligence data. In fact, this LinkedIn example should be a breeze. The trick is fine-tuning our filters and regular expressions to get only relevant data. Let's just dive right into the code and then analyze some sample output:
#!/usr/bin/perl -w use strict; use LWP::UserAgent; use LWP::Protocol::https; my $ua = LWP::UserAgent->new; my $usage = "Usage ./googlepluslinkedin.pl <target name>"; my $target = shift or die $usage; my $gUrl = 'https://www.google.com/search?q=site:linkedin.com+%22at+'.$target.'%22'; my %lTargets = (); # unique $ua->agent("Mozilla/5.0 (Windows; U; Windows NT 6.1 en-US; rv:1.9.2.18) Gecko/20110614 Firefox/3.6.18"); $ua->timeout(10); # setup a timeout my $google = getUrl($gUrl); # one and ONLY call to Google foreach my $title ($google =~ m/shref="/url?.*">[a-z0-9_. -]+s?.b.at $target..b.s-slinked/ig){ my $lRurl = $title; $title =~ s/.*">([^<]+).*/$1/; $lRurl =~ s/.*url?.*q=(.*)&sa.*/$1/; print $title,"-> ".$lRurl."n"; my @ln = split(/15?12/,getUrl($lRurl)); foreach(@ln){ if(m/title="/i){ my $link = $_; $link =~ s/.*href="([^"]+)".*/$1/; next if exists $lTargets{$link}; $lTargets{$link} = 1; my $name = $_; $name =~ s/.*title="([^"]+)".*/$1/; print "t",$name," : ",$link,"n"; } } } sub getUrl{ sleep 1; # pause... my $res = $ua->get(shift); if($res->is_success){ return $res->as_string; }else{ die $res->status_line; } }
The preceding Perl program makes one query to Google to find all possible positions from the target; for each position found, it queries LinkedIn to find employees of the target. The regular expressions used were finely crafted after inspection of the returned HTML object from a simple query to both Google and LinkedIn.
This is a great example of how we can spider off from our initial Google results to gather even more intelligence using Perl automation. Let's take a look at some sample outputs from this program when used against Walmart.com:
root@wnld960:~# perl linkedIn.pl Walmart Buyer : http://www.linkedin.com/title/buyer/at-walmart/ Jason Kloster : http://www.linkedin.com/in/jasonkloster Rajiv Ahirwal : http://www.linkedin.com/in/rajivahirwal ... Store manager : http://www.linkedin.com/title/store%2Bmanager/at-walmart/ Benjamin Hunt 13k+ (LION) #1 Connected Leader at Walmart : http://www.linkedin.com/in/benjaminhunt01 ... Shift manager : http://www.linkedin.com/title/shift%2Bmanager/at-walmart/ Frank Burns : http://www.linkedin.com/pub/frank-burns/24/83b/285 ... Assistant store manager : http://www.linkedin.com/title/assistant%2Bstore%2Bmanager/at-walmart/ John Cole : http://www.linkedin.com/pub/john-cole/67/392/b39 Crystal Herrera : http://www.linkedin.com/pub/crystal-herrera/92/74a/97b root@wnld960:~#
The preceding (trimmed) output provided some great insight into employee positions, and even real employees in those positions of the target, with a simple call to one script.
All of this information is publicly available information and we are not directly attacking Walmart or its employees; we are just using this as an example of intelligence-gathering techniques during a penetration test using Perl programming.
This information can further be used for reporting, and we can even extend this data into other areas of research. For instance, we can easily follow the LinkedIn links with LWP::UserAgent and pull even more data from the publicly available LinkedIn profiles. This data, when compared to Google+ profile data and simple Google searches, should help in providing a background to create a more believable pretext for social engineering.
Now, let's see if we can use Google to search more social media websites for information on our client target.
We can easily argue that Facebook is one of the largest social networking sites around during the writing of this book. Facebook can easily return a large amount of data about a person, and we don't even have to go to the site to get it! We can easily extend our reach into the Web with the gathered employee names, from our previous code, by searching Google using the site:faceboook.com parameter and the exact same syntax as from the first example in the Google section of the E-mail address gathering section. The following are a few simple Google dorks that can possibly reveal information about our client target:
site:facebook.com "manager at target" site:facebook.com "ceo at target" site:facebook.com "owner of target" site:facebook.com "experience at target"
This information can return customer and employee criticism that can be used for a wide array of penetration-testing purposes, including social engineering pretexting. We can narrow our focus even further by adding other keywords and strings from our previously gathered intelligence, such as city names, company names, and more. Just about anything returned can be compiled into a unique wordlist for password cracking, and contrasted with the known data with Digital Credential Analysis (DCA).
Domain Name Services (DNS) are used to translate IP addresses into hostnames so that we can use alphanumeric addresses instead of IP addresses for websites or services. It makes our lives a lot easier when typing in a URL with a name rather than a 4-byte numerical value. Any client target can potentially have full control over their naming services. DNS A records can be assigned to any IP address. We can easily write our own record with domain control for an IPv4 class A address, such as 10.0.0.1, which is commonly done for an internal network to allow its users to easily connect to different internal services.
Sometimes, when we can get an IP address for a client target, we can pass this IP address to the Whois database, and in return, we can get a range of IP addresses in which our IP lies and the organization that owns the range. If the organization is our target, then we now know a range of IP addresses pointing directly to their resources. Usually, this information is given during a penetration test, and the limitations on the lengths that we are allowed to go to for IP ranges are set so that we can be limited simply to reporting. Let's use Perl and the Net::Whois::Raw module to interact with the American Registry for Internet Numbers (ARIN) database for an IP address:
#!/usr/bin/perl -w use strict; use Net::Whois::Raw; die "Usage: perl netRange.pl <IP Address>" unless $ARGV[0]; foreach(split(/n/,whois(shift))){ print $_,"n" if(m/^(netrange|orgname)/i); }
The preceding code, when run, should produce information about the network range and organization name that owns the range. It is very simple, and it can be compared to calling the whois program form the Linux command line. If we were to script this to run through a number of different IP addresses and run the Whois query against each one, we could be violating the terms of service set by ARIN. Let's test it and see what we get with a random IP address:
root@wnld960:~# perl whois.pl 198.123.2.22 NetRange: 198.116.0.0 - 198.123.255.255 OrgName: National Aeronautics and Space Administration root@wnld960:~#
This is the output from our Perl program, which reveals an IP range that can belong to the organization listed.
If this fails, and we need to find more than one hostname owned by our client target, we can try a brute force method that simply checks our name servers; we will do just that in the next section.
DIG stands for domain information groper and is a utility to do just that using DNS queries. The DIG Linux utility has actually replaced the older host and nslookup. In making these queries, one thing to note is that when we don't specify a name server to use, the DIG utility will simply use the Linux OS default resolver. We can, however, pass a name server to DIG; we will cover this in the upcoming section, Zone transfers. There is a nice object-oriented Perl module for DIG that we will examine, which is called Net::DNS::Dig. Let's quickly look at an example to query our DNS with this module:
#!/usr/bin/perl -w use Net::DNS::Dig; use strict; my $dig = new Net::DNS::Dig(); my $dom = shift or die "Usage: perl dig.pl <domain>"; my $dobj = $dig->for($dom, 'A'); # print $dobj->sprintf; # print entire dig query response print "CODE: ",$dobj->rcode(1),"n"; # Dig Response Code my %mx = Net::DNS::Dig->new()->for($dom,'MX')->rdata(); while(my($val,$server) = each(%mx)){ print "MX: ",$server," - ",$val,"n"; }
The preceding code is simple. We create a DIG object $dig and call the for() method, passing the domain name we pulled by shifting the command-line arguments and types for A records. We print the returned response with sprintf(), and then the response code alone with the rcode() method. Finally, we create a hash object %mx from the rdata() method. We pass the rdata() object returned from making a new Net::DNS::Dig object, and call the for() method on it with a type of MX for the mail server. Let's try this against a domain and see what is returned:
root@wnld960:~# perl dig.pl weaknetlabs.com ; <<>> Net::DNS::Dig 0.12 <<>> -t a weaknetlabs.com. ;; ;; Got answer. ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34071 ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;weaknetlabs.com. IN A ;; ANSWER SECTION: weaknetlabs.com. 300 IN A 198.144.36.192 ;; Query time: 118 ms ;; SERVER: 75.75.76.76# 53(75.75.76.76) ;; WHEN: Mon May 19 18:26:31 2014 ;; MSG SIZE rcvd: 49 -- XFR size: 2 records CODE: NOERROR MX: mailstore1.secureserver.net - 10 MX: smtp.secureserver.net – 0
The output is just as expected. Everything above the line starting with CODE is the response from making the DIG query. CODE is returned from the rcode() method. Since we passed a true value to rcode(), we got a string type, NOERROR, returned. Next, we printed the key and value pairs of the %mx Perl hash, which displayed our target's e-mail server names.
Keeping the previous lesson in mind, and knowing that Linux offers a great wealth of networking utilities, we might be inclined to write our own DNS brute force tool to enumerate any possible A records that our client target could have made prior to our penetration test. Let's take a quick look at the nslookup utility we can use to check if a record exists:
trevelyn@wnld960:~$ nslookup admin.warcarrier.org Server: 75.75.76.76 Address: 75.75.76.76#53
Non-authoritative answer:
Name: admin.warcarrier.org
Address: 10.0.0.1
trevelyn@wnld960:~$ nslookup admindoesntexist.warcarrier.org
Server: 75.75.76.76
Address: 75.75.76.76#53
** server can't find admindoesntexist.warcarrier.org: NXDOMAIN
trevelyn@wnld960:~$
This is the output of two calls to nslookup, the networking utility used for returning IP addresses of hostnames, and vice versa. The first A record check was successful, and the second, the admindoesntexist subdomain, was not. We can easily see from the output of this program how we can parse it to check whether the subdomain exists. We can also see from the two subdomains that we can use a simple word list of commonly used subdomains for efficiency, before trying many possible combinations.
A lot of intelligence gathering might have already been done for you by search engines such as Google. In fact, the keyword search site: can return more than just the www subdomains. If we broaden our num= URL GET parameter and loop through all possible results by incrementing the start= parameter, we can potentially get results from other subdomains of our target.
Now that we have seen the basic query for a subdomain, let's turn our focus to use Perl and a new Perl module, Net::DNS, to enumerate a few subdomains:
#!/usr/bin/perl -w use strict; use Net::DNS; my $dns = Net::DNS::Resolver->new; my @subDomains = ("admin","admindoesntexist","www","mail","download","gateway"); my $usage = "perl domainbf.pl <domain name>"; my $domain = shift or die $usage; my $total = 0; dns($_) foreach(@subDomains); print $total," records testedn"; sub dns{ # search sub domains: $total++; # record count my $hn = shift.".".$domain; # construct hostname my $dnsLookup = $dns->search($hn); if($dnsLookup){ # successful lookup my $t=0; foreach my $ip ($dnsLookup->answer){ return unless $ip->type eq "A" and $t<1; # A records print $hn,": ",$ip->address,"n"; # just the IP $t++; } } return; }
The preceding Perl program loops through the @domains array and calls the dns() subroutine on each, which returns or prints a successful query. The $t integer token is used for subdomains, which has several identical records to avoid repetition in the program's output. After this, we simply print the total of the records tested. This program can be easily modified to open a word list file, and we can loop through each by passing them to the dns() subroutine, with something similar to the following:
open(FLE,"file.txt"); while(<FLE>){ dns($_); }
As we have seen with an A record, the admin.warcarrier.org entry provided us with some insight as to the IP range of the internal network, or the class A address 10.0.0.1. Sometimes, when a client target is controlling and hosting their own name servers, they accidentally allow DNS zone transfers from their name servers into public name servers, providing the attacker with information where the target's resources are. Let's use the Linux host utility to check for a DNS zone transfer:
[trevelyn@shell ~]$ host -la warcarrier.org beth.ns.cloudflare.com Trying "warcarrier.org" Using domain server: Name: beth.ns.cloudflare.com Address: 2400:cb00:2049:1::adf5:3a67#53 Aliases: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20461 ;; flags: qr aa; QUERY: 1, ANSWER: 13, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ; warcarrier.org. IN AXFR ;; ANSWER SECTION: warcarrier.org. 300 IN SOA beth.ns.cloudflare.com. warcarrier.org. beth.ns.cloudflare.com. 2014011513 18000 3600 86400 1800 warcarrier.org. 300 IN NS beth.ns.cloudflare.com. warcarrier.org. 300 IN NS hank.ns.cloudflare.com. warcarrier.org. 300 IN A 50.97.177.66 admin.warcarrier.org. 300 IN A 10.0.0.1 gateway.warcarrier.org. 300 IN A 10.0.0.124 remote.warcarrier.org. 300 IN A 10.0.0.15 partner.warcarrier.org. 300 IN CNAME warcarrier.weaknetlabs.com. calendar.warcarrier.org. 300 IN CNAME login.secureserver.net. direct.warcarrier.org. 300 IN CNAME warcarrier.org. warcarrier.org. 300 IN SOA beth.ns.cloudflare.com. warcarrier.org. beth.ns.cloudflare.com. 2014011513 18000 3600 86400 1800 Received 401 bytes from 2400:cb00:2049:1::adf5:3a67#53 in 56 ms [trevelyn@shell ~]$
As we see from the output of the host command, we have found a successful DNS zone transfer, which provided us with even more hostnames used by our client target. This attack has provided us with a few CNAME records, which are used as aliases to other servers owned or used by our target, the subnet (class A) IP addresses used by the target, and even the name servers used. We can also see that the default name, direct, used by CloudFlare.com is still set for the cloud service to allow connections directly to the IP of warcarrier.org, which we can use to bypass the cloud service.
The host command requires the name server, in our case beth.ns.cloudflare.com, before performing the transfer. What this means for us is that we will need the name server information before querying for a potential DNS zone transfer in our Perl programs. Let's see how we can use Net::DNS for the entire process:
#!/usr/bin/perl -w use strict; use Net::DNS; my $usage = "perl dnsZt.pl <domain name>"; die $usage unless my $dom = shift; my $res = Net::DNS::Resolver->new; # dns object my $query = $res->query($dom,"NS"); # query method call for nameservers if($query){ # query of NS was successful foreach my $rr (grep{$_->type eq 'NS'} $query->answer){ $res->nameservers($rr->nsdname); # set the name server print "[>] Testing NS Server: ".$rr->nsdname."n"; my @subdomains = $res->axfr($dom); if ($#subdomains > 0){ print "[!] Successful zone transfer:n"; foreach (@subdomains){ print $_->name."n"; # returns a Net::DNS::RR object } }else{ # 0 returned domains print "[>] Transfer failed on " . $rr->nsdname . "n"; } } }else{ # Something went wrong: warn "query failed: ", $res->errorstring,"n"; }
The preceding program that uses the Net::DNS Perl module will first query for the name servers used by our target and then test the DNS zone transfer for each target. The grep() function returns a list to the foreach() loop of all name servers (NS) found. The foreach() loop then simply attempts the DNS zone transfer (AXFR) and returns the results if the array is larger than zero elements. Let's test the output on our client target:
[trevelyn@shell ~]$ perl dnsZt.pl warcarrier.org [>] Testing NS Server: hank.ns.cloudflare.com [!] Successful zone transfer: warcarrier.org warcarrier.org admin.warcarrier.org gateway.warcarrier.org remote.warcarrier.org partner.warcarrier.org calendar.warcarrier.org direct.warcarrier.org [>] Testing NS Server: beth.ns.cloudflare.com [>] Transfer failed on beth.ns.cloudflare.com [trevelyn@shell ~]$
The preceding (trimmed) output is a successful DNS zone transfer on one of the name servers used by our client target.
With knowledge of how to glean hostnames and IP addresses from simple queries using Perl, we can take the OSINT a step further and trace our route to the hosts to see what potential target-owned hardware can intercept or relay traffic. For this task, we will use the Net::Traceroute Perl module. Let's take a look at how we can get the IP host information from relaying hosts between us and our target, using this Perl module and the following code:
#!/usr/bin/perl -w use strict; use Net::Traceroute; my $dom = shift or die "Usage: perl tracert.pl <domain>"; print "Tracing route to ",$dom,"n"; my $tr = Net::Traceroute->new(host=>$dom,use_tcp=>1); for(my$i=1;$i<=$tr->hops;$i++){ my $hop = $tr->hop_query_host($i,0); print "IP: ",$hop," hop time: ",$tr->hop_query_time($i,0), "ms hop status: ",$tr->hop_query_stat($i,0), " query count: ",$tr->hop_queries($i),"n" if($hop); }
In the preceding Perl program, we used the Net::Traceroute Perl module to perform a trace route to the domain given by a command-line argument. The module must be used by first calling the new() method, which we do when defining $tr as a query object. We tell the trace route object $tr that we want to use TCP and also pass the host, which we shift from the command-line arguments. We can pass a lot more parameters to the new() method, one of which is debug=>9 to debug our trace route. A full list can be obtained from the CPAN Search page of the Perl module that can be accessed at http://search.cpan.org/~hag/Net-Traceroute/Traceroute.pm. The hops method is used when constructing the for() loop, which returns an integer value of the hop count. We then assign this to $i and loop through all hop and print statistics, using the methods hop_query_host for the IP address of the host, hop_query_time for the time taken to reach the host, and hop_query_stat that returns the status of the query as an integer value (on our lab machines, it is returned in milliseconds), which can be mapped to the export list of Net::Traceroute according to the module's documentation. Now, let's test this trace route program with a domain and check the output:
root@wnld960:~# sudo perl tracert.pl weaknetlabs.com Tracing route to weaknetlabs.com IP: 10.0.0.1 hop time: 0.724ms hop status: 0 query count: 3 IP: 68.85.73.29 hop time: 14.096ms hop status: 0 query count: 3 IP: 69.139.195.37 hop time: 19.173ms hop status: 0 query count: 3 IP: 68.86.94.189 hop time: 31.102ms hop status: 0 query count: 3 IP: 68.86.87.170 hop time: 27.42ms hop status: 0 query count: 3 IP: 50.242.150.186 hop time: 27.808ms hop status: 0 query count: 3 IP: 144.232.20.144 hop time: 33.688ms hop status: 0 query count: 3 IP: 144.232.25.30 hop time: 38.718ms hop status: 0 query count: 3 IP: 144.232.229.46 hop time: 31.242ms hop status: 0 query count: 3 IP: 144.232.9.82 hop time: 99.124ms hop status: 0 query count: 3 IP: 198.144.36.192 hop time: 30.964ms hop status: 0 query count: 3 root@wnld960:~#
The output from tracert.pl is just as we expected using the traceroute program of the Linux shell. This functionality can be easily built right into our port scanner application.
Shodan is an online resource that can be used for hardware searching within a specific domain. For instance, a search for hostname:<domain> will provide all the hardware entities found within this specific domain. Shodan is both a public and open source resource for intelligence. Harnessing the full power of Shodan and returning a multipage query is not free. For the examples in this article, the first page of the query results, which are free, were sufficient to provide a suitable amount of information. The returned output is XML, and Perl has some great utilities to parse XML. Luckily, for the purpose of our example, Shodan offers an example query for us to use as export_sample.xml. This XML file contains only one node per host, labeled host. This node contains attributes for the corresponding host and we will use the XML::LibXML::Node class from the XML::LibXML::Node Perl module. First, we will download the XML file and use XML::LibXML to open the local file with the parse_file() method, as shown in the following code:
#!/usr/bin/perl -w use strict; use XML::LibXML; my $parser = XML::LibXML->new(); my $doc = $parser->parse_file("export_sample.xml"); foreach my $host ($doc->findnodes('/shodan/host')) { print "Host Found:n"; my @attribs = $host->attributes('/shodan/host'); foreach my $host (@attribs){ # get host attributes print $host =~ m/([^=]+)=.*/," => "; print $host =~ m/.*"([^"]+)"/,"n"; } # next print "nn"; }
The preceding Perl program will open the export_sample.xml file and navigate through the host nodes using the simple xpath of /shodan/host. For each <host> node, we call the attribute's method from the XML::LibXML::Node class, which returns an array of all attributes with information such as the IP address, hostname, and more. We then run a regular expression pattern on the $host string to parse out the key, and again with another regexp to get the value. Let's see how this returns data from our sample XML file from ShodanHQ.com:
root@wnld960:~#perl shodan.pl
Host Found:
hostnames => internetdevelopment.ro
ip => 109.206.71.21
os => Linux recent 2.4
port => 80
updated => 16.03.2010
Host Found:
ip => 113.203.71.21
os => Linux recent 2.4
port => 80
updated => 16.03.2010
Host Found:
hostnames => ip-173-201-71-21.ip.secureserver.net
ip => 173.201.71.21
os => Linux recent 2.4
port => 80
updated => 16.03.2010
The preceding output is from our shodan.pl Perl program. It loops through all host nodes and prints the attributes.
As we can see, Shodan can provide us with some very useful information that we can possibly use to exploit later in our penetration testing. It's also easy to see, without going into elementary Perl coding examples, that we can find exactly what we are looking for from an XML object's attributes using this simple method. We can use this code for other resources as well.
Gaining information about the actual physical address is also important during a penetration test. Sure, this is public information, but where do we find it? Well, the PTES describes how most states require a legal entity of a company to register with the State Division, which can provide us with a one-stop go-to place for the physical address information, entity ID, service of process agent information, and more. This can be very useful information on our client target. If obtained, we can extend this intelligence by finding out more about the property owners for physical penetration testing and social engineering by checking the city/county's department of land records, real estate, deeds, or even mortgages. All of this data, if hosted on the Web, can be gathered by automated Perl programs, as we did in the example sections of this article using LWP::UserAgent.
As we have seen, being creative with our information-gathering techniques can really shine with the power of regular expressions and the ability to spider links. As we learned in the introduction, it's best to do an automated OSINT gathering process along with a manual process because both processes can reveal information that one might have missed.