Hiding in plain sight – OSINT and passive recon
We’ll be making heavy use of Kali Linux throughout this book, but some of the most important work you’ll do for many clients can be done from any device, regardless of a specialized toolset. You might be waiting in line at Starbucks with your personal smartphone, punching in some slick Google queries, and bam – you have a surprising head start before you’ve even arrived at your desk. Then, you sit down at Kali and spend half an hour digging up even more, and you haven’t sent a single packet across the wire to your target. But now, I can hear you at the back: You've said "OSINT" and "passive recon" — is there a difference? That’s a good question, with an annoying answer: It depends on whom you ask. These terms are often used synonymously, but the important distinction is where you’re sending your packets:
- With pure passive reconnaissance, your packets are going to a myriad of resources that are available on the public internet to anyone willing to ask. But they are not going to your target’s network. This can also mean that you aren’t sending any packets at all – you’re merely listening, as we do with wardriving.
- OSINT can mean both this purely passive task where no contact is made with your target and using your target’s resources that are explicitly meant for public use. Does your target allow a potential customer to create a free account? It behooves the pen tester to create an account as a potential customer would, but this probably means you’re directly communicating with your target’s network. The “meant for public use” part is what makes it OSINT.
Sounds like a pretty important distinction, right? The reason why they’re often treated as the same thing is that they both fall under the umbrella of a black box – our experience with the environment is like an ordinary outsider, as opposed to a white box, where, as pen testers, we fully understand the inner workings of the environment and we’re informing our efforts accordingly (of course, we can conduct our testing with only partial knowledge of the environment, which will be a blend of black and white, or a gray box). We’re touching on pen testing philosophy at this point – how realistic is the test in representing a real-world potential attack? For those of us passionate about security, we stand by Shannon's Maxim. That is, we should always assume that the enemy will have full knowledge of how our system works. A real-world enemy will have scoured the internet for any tidbits about their target. A real-world enemy will have created accounts with the target’s services and spent a considerable amount of time gaining the same level of familiarity as any old hand. This being said, your client may need to understand how their environment works from different perspectives, and you might very well be prohibited from using information gained from the view of a registered user. Another consideration is time – you will be operating on a schedule, and you don’t want to put the other phases of the assessment in a crunch.
Walking right in – what the target intends to show the world
The nature of your target will tell you how much is meant to be shown to the world. For example, if your target is a bank, then they will provide comprehensive resources for both their current customers and in their efforts to attract new account holders. Even a more private entity needs to put themselves out there in some regard (for example, a private network that needs to be remotely accessed). There’s an old saying in computer security: the most secure computer is sealed in a concrete box and sits on the ocean floor. If no one can actually use the computer, it seems like a waste of concrete, and so our clients will host services and websites anyway.
Examining the target’s websites
One of the first things I do with a target is browse their website and View page source. This screenshot shows how to grab it in Microsoft Edge, but right-clicking on a page will bring up the option in any of the major browsers:
This option will open a new tab and display the HTML source for the page. Often, this won’t reveal anything that isn’t already visible (it is a markup language, after all). But there may be comments in the source and other treats not intended to be displayed by the browser, and these can give us morsels of information about our target that will inform our attack.
With this client, the page source revealed a folder called assets
:
We see references to scripts that can be found on the host under the assets
folder. So, just drop this into your address bar and see what happens – http://www.your-client.com/assets:
We haven’t even done anything yet – just pulled up the public site in an ordinary browser – but we see this host is telling us a couple of things:
- It’s an Apache server, version 2.4.41, running on Unix (or Unix-like).
- It wasn’t configured in the most secure manner.
That second point is the most important observation. Does revealing the server version like this really matter that much? Sure, it gives us a heads-up for our research, but it’s not exactly a welcome mat either. What it tells us about is the administrator’s general approach to operational security. The kind of server administrator who either doesn’t know or doesn’t care about the risk, regardless of how tiny, is more likely to be the kind of administrator who, for example, asks people in public forums for help with some new hardware at work, even providing logs that you’d be lucky to get during the assessment.
Don’t be so antisocial – examining the target’s presence on social media
We live in funny times, when it seems like everyone and their grandparents are willingly sharing all their personal details with social media companies. Back in my day, you’d hear about the cool kids having a party at their parents’ house and you’d think, now that's the place to be on Friday night. Your target is hearing the same thing about social media today – everyone’s on Facebook, Twitter, Instagram, and TikTok, so that’s where you’re going to meet the cool kids (or potential customers, as the case may be). In this screenshot, we see how our target is encouraging engagement:
You’re not likely to find juicy tidbits about your target from posts that they made on social media. You’re likely to find the good stuff from other users of the social media platform in question. For example, you click the Facebook button and end up on a page set up by your target. You browse the comments: Jane is the GM at the Highland branch and she was really responsive to my needs. Or maybe a photo from a company picnic with 14 likes, and one of the likes is Jane’s, and she loves to share pictures of her pets, kids, car, home, and her favorite latte at Starbucks over on her profile page.
I probably sound like a ranting lunatic (I am, but that’s not important right now), but the point is to soak up all of this information and take good notes. We’re in the first chapter of the book, discussing what will probably be chapter one of your assessment with a client. That Jane names her dog Mr. Scruffles might seem useless, until day four, when you’re prompted with the security question for pet's name. Also consider that Jane’s IT guy, Dave, is a member of a popular Facebook group for IT admins to vent about their jobs; Dave just had a hard day working with your Cisco appliances and he’s ready to upload a diagnostic file.
Tread carefully!
We’re looking for information that’s already there. Do not attempt to communicate with any of the individuals you find during your social media searches, unless you’re conducting a social engineering assessment – this would most certainly not be passive!
Just browsing, thanks – stepping into the target’s environment
Wait a sec. Stepping into the target’s environment? Now I know I'm in the wrong chapter, you think. Indeed, this is where passive recon starts to blend into the broader term OSINT. The keyword so far has been passive – listening from the sidelines or taking a peek in the proverbial windows as we drive by. Now, the keywords are open source – we’re taking a look at things that are meant to be out in the open. We’re going to start getting a little braver with our efforts. Instead of figuratively driving by, we’ll park and walk into the shop and look around. It’s a door for the public and it says Open on the front, so we haven’t stepped outside the realm of open source. Sometimes, however, we can get interesting information about what’s going on behind the counter of our metaphorical shop.
Summoning the daemon – the fat-fingered email address
We’ve all misspelled someone’s name at some point. Perhaps you’re trying to send an email to the administrator of a domain and, gosh darn it, you misspelled administrator
. Oh, these pesky fingers of mine. As my mother-in-law would say, schlimazel (an unlucky or clumsy person). Let’s take a look at our outgoing email:
The point is to send an email to the target domain but to a recipient we know doesn’t exist. You could very well let your cat walk across the keyboard and use that as the recipient – the result would be the same. However, there’s a bit of a social engineering angle going on here. Just in case someone is reviewing these, my message is more likely to look like a legitimate attempt to communicate with the business or government agency. A smashed-keyboard email address and message body will look like a deliberate attempt to provoke a response. Bonus points if you actually do engage in a friendly conversation posing as a customer, but just let one of your messages have a fat-fingered recipient address. By sending an email to a nonexistent email address, we provoke a bounce message. Unlike sending an email to a nonexistent domain, only the target environment is going to know whether or not the user exists. The bounce will come from the target environment and often contains troubleshooting information with tasty tidbits for us fledgling hackers. Let’s take a peek at the non-delivery report from our client:
My favorite part of this bounce message is Diagnostic information for administrators. Golly, that sure is helpful of you, thank you!
I said this earlier, and it should be a mantra throughout the OSINT phase: this isn't exactly a welcome mat. It isn’t the keys to the kingdom, and this isn’t a movie – no amount of furious typing is going to change our position in the assessment. But let’s take a look at what we learned, step by step:
- The server that generated this report is
ME-VM-MBX02
and its IP address is10.255.134.142
. It’s reasonable to guess that this is a virtual machine, as the VM initialism is often incorporated into internal naming conventions by IT folks. It makes it easier to determine what troubleshooting may entail, at a glance. - The server that passed on this information to
ME-VM-MBX02
, our report-generating server, isME-VM-CAS02
, and its IP addresses are10.255.134.140
and10.255.27.36
. - The server that passed this information on to the
CAS02
host isME-VM-MAILGW01
and its IP address is10.255.134.160
. GW probably means gateway.
Hopefully, you have already picked up on the important part. That’s right – those are ten-dot addresses. As a refresher, addresses in the 10.0.0.0/8
block are reserved as private address space as defined by the Internet Assigned Numbers Authority (IANA) (refer to them as ten-dot or ten slash eight and you’ll be one of the cool kids). Addresses in the 10.0.0.0/8
block are not publicly routable, so why do we care, as uninformed outsiders? We’re clearly getting information from behind the perimeter. What else did we notice? Examine this line:
Microsoft SMTP Server (TLS) id 15.0.1497.2
Let’s jump back into our trusty search engine and look for Microsoft
and 15.0.1497.2
. Top result? Exchange Server build numbers and release dates. Search the page for that build number and we end up with Exchange Server 2013 CU (cumulative update) 23, released on June 18, 2019. Well, I’m writing this in 2021, almost 2 years later, so it’s back to the search engine to try this: vulnerabilities
and 2013 CU23
. We end up finding CVE-2021-28480
, CVE-2021-28481
, CVE-2021-28482
, and CVE-2021-28483
– remote code execution vulnerabilities. We already have an internal subnet to investigate: 10.255.0.0/16
. You have to admit this isn’t too shabby when you consider that all we did was send an email. Thus, here comes yet another reminder: take good notes. Write down everything you do. Don’t skimp on the screen captures – I would sometimes record my screen while I worked.
I know a guy – services doing the probing for you
Back in my day, we had to walk 15 miles through the snow to get to the pen test. We didn’t even have computers – we used empty bean cans with a string tied between them to send and receive packets. Okay, I’m joking, but things are definitely different these days for the younglings. There’s a lot of work that can be taken out of your hands in today’s world of what I like to call EaaS: Everything-as-a-Service. This is important for pen testers because it allows you to do more with a small amount of time – you’re only with your client for a set window of time and it won’t feel like enough. You’ll be taking advantage of time-saving measures at all phases of an assessment (hello, scripting ability) but OSINT is no exception – even though we haven’t sat down with Kali yet. Let’s take a look.
Security header scanners
There are a few of these online. Try typing into a search engine security header scanners
. One of the better ones is SecurityHeaderScanner.com, a service I used for this client example:
Yikes. That looks like my report card from my sophomore year of high school (sorry, Mom and Dad). In this particular assessment, I was able to use this information to pull off some successful cross-site scripting, clickjacking, and formjacking attacks. I could have figured this out manually, of course, but the time saved increases the value you provide to your client.
This is an example of a real-time test of public resources provided by your target – we asked this particular service to visit the website now and tell us what it sees. Another way to look at this pre-Kali stage of OSINT is to gather the information that has already been gathered by all of those crawlers taking peeks at every corner of the internet, 24/7/365. We need to be aware of the difference, as the information we find from such resources is not real-time and may not be accurate at the time of your assessment.
Open source wireless analysis with WIGLE
I would never forgive myself if I didn’t mention wigle.net in the context of open source digging with sites that did the probing for us already. This one is special, though – it’s a true crowd-sourced initiative. Resources like Shodan are organizations that own their probing and crawling machines. Their game is to give you access to the database they built with their own hardware. WIGLE, on the other hand, is a collection of what the world of volunteer wardrivers have gathered with their own hardware and mode of transportation.
Note
If the term is unfamiliar, wardriving refers to the practice of moving around an area with a device configured to detect and report wireless networks. The name suggests driving a car, as that’s a great way to cover larger areas, but you can also go warbiking, warwalking, or even send out a wardrone or a warkitteh (a man attached Wi-Fi sniffing hardware to his outdoor cat’s collar). I’m still not sure if warscooting is a thing yet.
At the time of writing, wigle.net contains information about 745 million networks, gathered from 10.5 billion individual observations. The key to the observations is the combination of device reconnaissance and GPS data, allowing you to place the observation on a map. Keep in mind, these locations are where the observation was made, not the location of the access point. This becomes clear when you zoom in on the map, as shown here:
You can see the observations largely center on roads, suggesting that the observers are driving around with their laptops or smartphones. But you can also see spots in the middle of wide-open spaces, like Firefighters Park in the preceding screenshot, or even in the middle of the ocean, as shown in the following screenshot:
These observations likely correspond to shipping lanes or even airways. This should give you an idea of the sheer size of this dataset.
Where it will be useful to you, as an intrepid open source investigator, is gathering information about wireless networks without setting foot near the site. With some clients, this won’t really mean much. But for others who may be physically spread out, like with a massive data center or numerous individual facilities, some recon on the location of certain networks may come in useful. Again, by location, we mean the area where an observation was possible. Wireless networks are low-power, and most wardrivers aren’t packing exceptionally high-gain antennas while driving around, so you can assume you’ll be within a block or two, if not closer.