Google’s dark side
Our last stop for goodies before we arrive at the desk where Kali eagerly awaits is Google. No, we’re not going to check the weather or find out why we call those spiky animals porcupines (apparently, it’s the Latin porcus (hog) and spina (thorn, spine) – who knew?). We’ll leverage the surgical scalpel of Google searching: operators. Keep the same spirit from Shodan – separate the operator from the query with a colon (:
) and no spaces. Google, however, allows us to get pretty advanced.
Badda-bing
The concepts here apply to the Bing search engine as well (though you’ll want to review the operator specifics on their help pages). As a distinct search engine, you may find results on Bing that you won’t find on Google, and vice versa. It’s worth checking all your options!
Google’s advanced operators
Let’s first discuss what makes up an ordinary web page. Of course, you have the URL to type into your browser and to share with your friends. Then, you have the title of the page, and the distinction is technical – it will be explicitly formatted this way with the <title>
tag in HTML. You’ll also have the text of the page, which is basically everything written on the page that isn’t the title or the URL. There are three reasons why we pen testers care about this:
- Google can find stuff left on pages by administrators who may have neglected to understand the public nature of their posts – including talking about specific clients and the products they manage.
- Google can find stuff left on pages by bad guys who may have already compromised your client, a partner, or an employee.
- Services with web portals will have signatures that can distinguish them. The use of specific words (such as
admin
) in the URL, or a product, version, or company name in the text of the page, and so on.
Google is designed for the average user, using its snazzy algorithm to find what you want, and even what you didn’t realize you wanted. However, it is ready for the advanced user, too. You just need to know what to say to it. There are two ways of doing this: with operators directly, or within the Advanced Search feature. Let’s take a look at the different operators for direct use:
intitle
: Return pages with your query within the page title.inurl
: Return pages with your query inside the URL to the page itself.allintitle
: Theallin
queries are special – they will only return results that contain all of your multiple keywords. For example,allintitle:"Satoshi" "identity" "bitcoin" "conspiracy"
will return pages that contain all four words somewhere in the title, but not pages that have only three of those words in the title.allinurl
: This will only return results where all of your terms are contained in the URL.allintext
: Return only the pages that contain all of your terms in the text of the page.filetype
: A particularly powerful option that lets you specify the file type. For example,filetype:pdf
will return PDF documents with your search criteria.link
: Another special fine-tuning option, this searches for pages that contain links to the URL or domain you specify here.
Just like with Shodan, you can negate an option with a dash (-
). For example, I can look for the word explorer
and avoid pages about the car with explorer –ford
. You can also look for the pages that maybe contain one or more of several terms (as opposed to the allin
options) with the OR
operator. For example, the following will only return pages with all four terms in quotation marks:
allintext:"Satoshi" "identity" "bitcoin" "conspiracy"
However, the next example will return pages that mention any of the terms:
"Satoshi" OR "identity" OR "bitcoin" OR "conspiracy"
A useful shorthand for OR
, by the way, is the pipe character (|
). So, this is identical to the previous search:
"Satoshi" | "identity" | "bitcoin" | "conspiracy"
The Advanced Search page
Google has made things a little more user-friendly – just add advanced_search
after the google.com URL, as shown in the following screenshot:
For some advanced search capabilities, this accomplishes the same thing as putting the operators directly into the search box. However, narrowing results down to a specific date range is best done from the results page. First, enter your search query, then, click Tools followed by the Any time dropdown to select a custom range, as shown here:
I remember needing to use the daterange:
operator with Julian dates. In other words, Christmas Day of 2020 was on Julian Day 2,459,209
. Trust me, using a graphical calendar is much less annoying.
Thinking like a dark Googler
I’ve had a lot of financial organizations as pen test clients. The nature of their business involves a lot of paperwork, so it’s particularly tricky to keep everything tidy. Let’s take a look at a possible Google hacking mission, in this case, digging up financial information. Of course, for your needs, you’ll be using your client’s name or the name of an employee to accompany your fine-tuned search terms.
First, I try the following:
intitle:"index of" "Parent Directory" ".pdf" "statement"
Let’s break this down. By looking for index of
with the words Parent Directory
somewhere on the page, I’ll be finding exposed file directories that are hosted via HTTP/S. I’m also looking for any text with .pdf
in it, which will catch directories hosting PDF files. Finally, I’m hoping someone will have put the word statement
somewhere in their filename. As you can imagine, we’ll probably grab some false positives with this. But you may also find things like this, which I’m fairly certain was not intended to be sitting on the open web:
Looks like someone’s going on a trip! This find didn’t have statement
in its filename, but the files next to it did. When I click Parent Directory on some of these pages, I end up at the home page for the domain or a 404
page, strongly suggesting that these exposed directories are accidents. There’s nothing quite like a false sense of security to help you out in your endeavor. Finding an employee’s passports, tax returns, and the like, before you even sit down with your Kali toolkit, is a powerful message for your client’s management.
There are plenty of resources online to help you with sneaky Google searches. The Google Hacking Database over at the Exploit Database (exploit-db.com) is an excellent place to check out. I won’t rehash all the different searches you could try. The key lesson here is to apply whatever information you have on your client and try thinking in terms of how a resource presents itself to the internet. For example, I had a client for whom my initial research suggested the presence of a Remote Desktop portal. Searching the client’s domain with this was helpful:
inurl:RDWeb/Pages/en-US/login.aspx
How did I come up with that? Simple: I researched how these devices work. Find one, talk to it with your browser, and build a Google query with your client’s information. Have you considered your client’s IT support? We all need to ask for help now and then. Perhaps some of the IT staff at your client have asked for support online. Hmm, I'm not sure, a helpful compatriot replies, can you upload a packet dump from the device? Next thing you know, information deeply internal to your client has been exfiltrated to the web. I’ve seen it with clients more times than I’d like to admit. Just look for those communities and try combining parts of the URLs with inurl
. For example, if you see your client’s name pop up along with the following, then you have a head start on the security software they may be using:
inurl:"broadcom.com/enterprisesoftware/communities"
An important skill with something as inherently hit-or-miss as OSINT is outside-the-box thinking. Suppose you’ve tried all of the Google tricks you can think of, looking for different vendors and URL strings, and you’ve come up dry. Well, do you know anything about the people who work there? I once had a client whose IT administrator had a unique name in her personal email address.
It didn’t take long before I linked this to a different username that she had used on Yahoo! in the past. I took this username and tried all kinds of search combinations, and boom – an obscure forum for the administrators of a highly specific operating system in an enterprise environment had posts from a user with this same name. She was careful enough to avoid mentioning her employer, which is why the usual searches described previously didn’t get me there. But I was able to connect the dots and determine she was indeed referring to the configuration of these hosts inside the network of my client, and later I could even correlate independent findings with information in these public posts. The connection that brought me to that information was just her use of an old Yahoo! Messenger name when anonymously discussing her IT work. Needless to say, she was a bit surprised that I had found it. On a different engagement, I took to Google from the other direction – I was already inside the network and had a foothold on a domain controller. I started grabbing password hashes, which is a massive finding in its own right for my report. However, I wondered what would happen if I tried punching some of these hashes into Google. Sure enough, I found a site where hackers share their loot and my client had been compromised. This was an additional tidbit to enhance the report and helped them get the ball rolling on determining how that unauthorized access had occurred.
Here’s an idea!
Think about how people create passwords, generate some hashes corresponding to your guesses, and search Google for those hashes. Usually (and hopefully), you’ll come up dry. The most common passwords, such as 12345
and iloveyou
, are already out there, so think like someone who works for your client and lives near there. For example, one thing I learned while working with companies in the state of Ohio is that Ohioans love college football. Hey, most Midwesterners do. I had a disturbing number of positive hits when I generated hashes based on the word Buckeyes
.
Hey, order’s up. Grab your coffee and bagel, leave the drive-thru, and get to the office – we got a good amount of recon done with Google and our smartphone, but now it’s time to sit down at the helm of Kali and see how the folks at Offensive Security have moved its toolset into this decade.