We can define web attacks as activities that “targets vulnerabilities in websites to gain unauthorized access, obtain confidential information, introduce malicious content, or alter the website’s content” [2]. This includes the preparatory steps necessary for successful attacks in the context of web applications, such as information gathering, context-related risk analysis (threat modeling), and vulnerability discovery and analysis.
We will usually encounter these activities whether we are penetration testers, code reviewers, security researchers, or bug hunters. Even if we are red teamers and work primarily on networks and operating systems, we can find web applications during Initial Access [3], as well as when playing Capture the Flag (CTF) exercises, trying to solve web challenges.
Understanding these types of attacks can prove beneficial for various roles:
- Developers: Gaining an “attacker’s” perspective can assist in writing more secure code. This efficient approach is commonly incorporated into the security awareness courses we teach.
- Forensic Analysts and Incident Responders: They might need to analyze incidents involving applications or web servers. Knowledge about these attacks can provide a comprehensive understanding of what happened.
- Security Managers and Chief Information Security Officers: They may need to assess and manage risks related to web applications. This understanding can be instrumental in forming strategic security measures.
Now that we know what web attacks are, let’s look at how to approach them when dealing with an application.
What is exploitation?
Let’s solve the exploitation definition we discussed in the epigraph, so we’re all on the same page.
It all begins with a bug – an issue in the code, design, or configuration that generates a malfunction, incorrect results, a crash, or an abnormal termination.
We are particularly interested in bugs that have security implications (security bugs), which can potentially be used to compromise an application or one of its components.
Unfortunately, or fortunately, not all security bugs are potentially exploitable; when they are, they are called vulnerabilities.
So, an exploit is a code or a procedure that allows you to take advantage of one or more vulnerabilities, and exploitation is the term used to describe this process.
The approach
Discovering and exploiting vulnerabilities can be likened to a problem-solving exercise.
Consider this example – we were hired to conduct a Web Application Penetration Test (WAPT) on one web application accessible online: https://onofri.org/security/. We started from scratch – no credentials or inside information about the target. Thus, we interacted with a user-friendly web application that reciprocated our requests with HTML code, JavaScript, CSS, and images. What’s our next move?
If this was our first time engaging in such an activity or our first encounter with this target type, we could have considered two distinct approaches. The first, a more academic approach, involves studying all relevant theoretical concepts before proceeding to the practical stage. The second, a decided tinkerer approach, encourages hands-on experience.
However, there is a third way to balance these two extremes. As the Latins once stated, “In medio stat virtus” (“virtue stands in the middle”):
- Acquire a foundational understanding of theoretical concepts. This doesn’t involve becoming an expert but providing context and aiding navigation in specific situations. This foundational understanding can be bifurcated into two parts – understanding the technology itself and knowing about potential vulnerabilities and attacks that might be employed.
- Dive into hands-on practice. This involves exploring our needs through trial and error, observing an application’s responses to our requests, and modifying the application to understand its workings better. In this process, we loop back to theoretical concepts as and when required. This iterative approach allows for both practical and theoretical growth.
Following the various steps, let’s see how we use this approach when attacking a web application.
The approach in the book
This book embodies this approach through its structure – the initial part serves as a primer, while the following two provide practical, scenario-based examples.
Moreover, every scenario-centric chapter commences with a theoretical discussion before transitioning into the practical aspect.
The process
When we launch a web attack, we rely on a process that involves preparatory steps such as information gathering, threat modeling, vulnerability discovery, and vulnerability analysis. Then, we have the actual attack, which – if successful – leads to exploitation. These steps are based on the technical sections of the Penetration Testing Executing Standard (PTES) [4].
Information gathering
If we start without having any information about the target, the first thing we do is to understand the technology that underpins the application. There are several methods. Examining the HTML code returned from https://onofri.org/security is the most straightforward and least invasive. We can do this from any web browser, such as Firefox, by pressing Ctrl + U on Windows and Linux or Cmd + U on macOS.
We will find two particularly interesting lines from the HTML code associated with the meta
tag named generator
. As name
suggests, this tag typically contains information about the software used to generate the page:
<meta name="generator" content="WordPress 6.2.2"/>
The code remains quite clear, even if we do not know HTML. We can now infer that WordPress version 6.2.2 powers the website.
Our next step is to visit the WordPress site for further investigation. First, we will check whether the installed version is the latest and whether any known vulnerabilities are associated.
To become more familiar with WordPress, as open source software with publicly available code, we will download it and examine its file structure and contents. We can read the PHP (a recursive acronym for PHP: Hypertext Preprocessor) code and understand the structure – some foundational files – named WordPress Core – and a wide range of plugins and themes.
The source code gives us a significant advantage because it allows us to find vulnerabilities through static analysis by reviewing the code instead of relying solely on dynamic methods, such as sending queries. It also allows us to recreate the target application in our lab environment for analysis. This controlled environment allows us to modify the application, enhancing our understanding in a more “hybrid” fashion.
As Core allows additional plugins and themes, our next step should be identifying which ones are installed. Let’s understand the installed theme.
The file structure shows the themes inside the wp-content/themes
directory. We then examine the HTML code again for this information. We can find it easily:
<script src='https://onofri.org/security/wp-content/themes/astra/assets/js/minified/frontend.min.js?ver=4.1.5' id='astra-theme-js-js'></script>
We’ve determined that the active theme is astra
. We know the theme but not the version. However, we can download it to determine when to read the version. From the theme directory, we find the following file list:
404.php, admin, archive.php, assets, changelog.txt, comments.php, footer.php,functions.php, header.php, inc, index.php, languages, page.php, readme.txt,screenshot.jpg, search.php, searchform.php, sidebar.php, single.php, style.css,template-parts, theme.json, toolset-config.json, wpml-config.xml
Take readme.txt
, for example, which contains extensive metadata. Unfortunately, we get blocked when we try to access it via https://onofri.org/security/wp-content/themes/astra/readme.txt
.
Undaunted, we look for an alternative and find that changelog.txt
contains the version information and is accessible via https://onofri.org/security/wp-content/themes/astra/changelog.txt. We can get the installed version from here by looking for the latest entry:
v4.1.5
- Fix: Offcanvas Menu - Transparent empty canvas visible on expanding offcanvas toggle button.
- Fix: Custom Layouts - Block editor core blocks custom spacing incorrectly applies to custom layout blocks in editor.
In addition, our familiarity with WordPress allows us to identify the login page address (https://onofri.org/security/wp-login.php) and potentially perform actions such as user enumeration or password discovery.
This is an example of our strategy when targeting a web application. Given the target scope of https://onofri.org/security, we can discover numerous other elements.
Now that we know the version of WordPress, the theme, and its version, we can proceed by enumerating the installed plugins.
This can be done passively by examining the generated code or more actively (and somewhat aggressively) by creating a list of all available plugins (or the most commonly installed ones) and checking for the presence of files in the target path.
In the same way, we can consider a wordlist of common files such as phpinfo.php
, info.php
, or test.php
.
Threat modeling
Once we understand our target, we will prepare our potential avenues of attack. To determine the most effective types of attacks, we need to understand the context and related risks. This practice is called threat modeling. We can be specific about the capabilities and the technology used and match them to our goals, such as the following:
- If a SQL database is used, we might try SQL injection to gain database access (see an example in Chapter 4).
- If there are functions that send commands to the operating system, we can attempt command injections to execute arbitrary commands (see an example in Chapter 5).
- If a login page is available, we might try to access the admin panel or impersonate other users to have more control over the application (see an example in Chapter 3).
- If we can display input strings under our control, we can look for cross-site scripting (XSS) to execute arbitrary JavaScript on a user’s browser (see examples in Chapter 4 and Chapter 6).
Alternatively, we can use a relatively simple method, prompt lists, or checklists in risk management. These lists can guide us on what risks and attacks to consider. We can use the Open Worldwide Application Security Project (OWASP) or the Web Security Testing Guide (WSTG) [5] (formerly the OWASP Testing Guide), which provides a massive list of attacks organized into different categories.
Although these lists are massive, they are partial. For example, on OWASP Italy Day 2012, with a friend, we presented a study on semantic web-related vulnerabilities. We explained the SPARQL Protocol and RDF Query Language (SPARQL) language and how to do SPARQL Injections [6]. we also found a SQL injection inside the SPARQL endpoint. Despite this, SPARQL injection is not currently listed in the testing guide.
Vulnerability analysis
Armed with enough information about the target and a defined threat model, we can begin discovering vulnerabilities, analyzing them, and attempting to exploit them. This step typically varies in the amount of time it takes. We will focus on this particular aspect, as well as exploitation, in our book.
Let’s go ahead and continue with our example.
We will check whether WordPress, its plugins, and its themes are up to date with the latest version or whether known vulnerabilities are present. It went wrong for us this time.
However, we discovered a test page inadvertently exposed in our search for vulnerable pages. Its guessable name, test.php
, tipped us off.
When we visit the page at https://onofri.org/security/test.php, we find a form to enter text input. By inputting the text hello there
, we find it within the response exactly as we wrote it or, as we say in the jargon, “reflected”.
We can also see the effect by directly typing the text into the URL, using +
instead of a space: https://onofri.org/security/test.php?param=hello+there.
Let’s look at the source code:
<p id=echoed>
hello there
</p>
If we can execute arbitrary JavaScript code (e.g., an alert appears), we have found XSS. Since we are looking for XSS, let’s first see whether we can insert arbitrary HTML code. Let’s try the b
tag, which makes the text bold – <
b>hello there</b>
.
We can also write it directly into the URL (the browser can automatically substitute the space with a +
): https://onofri.org/security/test.php?param=<b>hello+there</b>.
Let's look at the source code again:
<p id=echoed>
<b>hello there</b>
</p>
Well, we are almost there! Let’s add some JavaScript code. To perform the classic XSS attack, we need to include the code alert(1)
within the script tag - <script>alert(1)</script>
. This will trigger a pop-up alert with the number 1
.
We can also write it directly in the URL: https://onofri.org/security/test.php?param=<script>alert(1)</script>
.
This time, things are not going the way we hoped. The answer says, Not Acceptable!
. Let’s look at the code:
<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>
Mod_Security
replied to us. We can go and look up what it is. According to its official GitHub [7], it’s an opn source Web Application Firewall (WAF). So, we have a defense system that needs to be bypassed.
Is it possible? Impossible? Easy? Difficult? If it’s the first time we have encountered it, we can’t know, and also it depends on how it’s configured and the rules applied.
The important thing is to take heart and proceed. Of course, bypasses can require time.
Let’s think rationally. We can assume that the script
tag triggers Mod_Security
. We can try another vector with a different tag, one of our favorites – <img src=x onerror=alert(1)>
. This vector retrieves a non-existing image, x
, specifying it in the src
attribute, and triggers an alert when the loading error is triggered via the onerror
attribute.
We are cautious and see first whether it likes the img
tag (in this case, the browser changed the space to %20
– the corresponding hexadecimal ASCII code): https://onofri.org/security/test.php?param=<img%20src=x>
.
Let’s look at the code:
<p id=echoed>
<img src=x>
</p>
It returns the image code, so it likes this. Let’s proceed with the full vector: https://onofri.org/security/test.php?param=<img%20src=x%20onerror=alert(1)>
.
Unfortunately, it didn’t work. Mod_Security
blocked us again:
<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>
Exploitation
To exploit this, we need to be creative.
We can search the internet for the various known bypasses and randomly throw them at the server, or we can be more surgical and study how Mod_Security
and the two rules work. The rules that are often applied are those of the OWASP coreruleset
.
Reading the XSS-specific configuration file [8], we find that the img
tag is filtered:
[…] h1|head|hr|html|i|iframe|ilayer|img|input|ins|isindex|kdb […]
But the video tag, defined in HTML5, is missing.
So let’s try the modified vector – <video src=x
onerror=alert(1)>
: https://onofri.org/security/test.php?param=<video%20src=x%20onerror=alert(1)>
Figure 1.1 – An alert from XSS
We then notice an alert
message in our browser. We exploited XSS. You can try it in your browser, assuming that this will be allowed after the book’s publication. But, in general, it’s just like a cat-and-mouse game:
<p id=echoed>
<video src=x onerror=alert(1)>
</p>
Of course, it’s not the only bypass that exists. To find a different one with ease, just read the code, study which tags can trigger JavaScript in the various versions of HTML, and try.
We exploited XSS by executing arbitrary JavaScript code, which often suffices in a web application penetration test. If we want to go further, we can weaponize XSS to steal the cookies of the WordPress admin.
Post-exploitation
Imagine what happens next. Let’s suppose, through some clever social engineering, we send a link to an administrator and then hijack their session.
Once we gain access to the WordPress admin panel, we can check whether the feature that allows direct editing of plugins or uploading custom plugins via the web interface is enabled.
This allows us to execute arbitrary PHP code on the server, which enables us to perform various actions. For example, we could load a custom web shell or use an existing one, such as those available on GitHub [9].
Even though we can execute system commands directly on the server via PHP, we are likely operating as a limited user. Therefore, we can gather more information to identify configuration issues or check whether there’s any outdated software running as root.
Alternatively, if we’ve stayed stealthy enough, we can patiently wait for an exploit related to the specific version we’re using to surface and then switch to root access, a strategy we’ve used successfully many times before.
In this brief web application penetration test example, we’ve navigated through various process steps to plan and execute an attack on a web application, combining theory and practice. We’ve also applied various testing techniques by interacting with the application and reading the code. We’ve realized that we need a set of skills, the basics of which help us right away and the others learn as we encounter them. Finally, we’ve realized that we need a resilient mindset that doesn’t shy away from challenges, pushes us to dig deeper when necessary, and spurs us to use our creativity to find new solutions.
We will explore these aspects in the following sections.
The testing techniques
Our example highlights that, initially, we interacted with the application. However, when the source code became available, we utilized it to gain an advantage compared to a more holistic approach. These techniques are also specified in Appendix C of NIST SP 800-115 [10], a technical guide with a similar process to PTES but enhanced with a more high-level vision.
Static analysis (white box)
When doing static analysis, we analyze the source code of the application. We must either have the source code or analyze the disassembled/decompiled code to do this.
The analysis is performed without executing the code, which remains static.
In this case, it is necessary to know the language in which the software is written, the peculiar bugs, and how to recognize them by reading.
In the case of web applications, codes usually are interpreted, so you need to know how to read server-side languages such as Python, PHP, Ruby, and C#. In other cases, you have bytecode, as with Java classes – for example, we usually try to disassemble or decompile it.
It is also helpful to know client-side programming/markup languages such as HTML, JavaScript (now used server-side), and CSS, which can be helpful in some complex attacks.
To use this approach, we must have many programming language skills or quickly recognize those we need to learn better.
Moreover, we may miss vulnerabilities since some can only be identified when running the code.
Dynamic analysis (black box)
In dynamic analysis, we analyze an application when code is executed in its environment, manipulating the input and observing the application or system’s reactions. We call this practice fuzzing.
Generally, a web application does this by manipulating inputs in GET
and POST
, cookies, headers, HTTP verbs, and so on. Other approaches include debugging and instrumentation – using an additional tool to run the target software under controlled conditions and observing it from the inside.
In the case of web applications, we usually use browsers and proxies to interact with the target and any libraries or frameworks that may be useful to automate our work.
In addition – when analyzing interpreted languages – we can also impact the interpreter’s functions, usually written in a lower-level language (for example, we can analyze an application written in PHP and then insist on the interpreter’s code developed in C).
Also, in this case, we may miss some vulnerabilities if we are not able, through our requests, to access all the branches of code, such as a vulnerability in a portion of code contained within a reasonable amount of if
statements or a function that is rarely called.
Hybrid analysis (gray box)
If we have both the running environment and the source code – because the software is provided to us, it is open source, or because we found it through other vulnerabilities – we can use a hybrid analysis.
This is the approach we mainly prefer for its effectiveness and efficiency.
Having the code available in one hand and having our proxy in the other hand, we can test what we read – looking for some good entry points from the source.
By utilizing techniques such as fuzzing and program flow verification through source analysis or leveraging debugging tools such as VS Code or dnSpy, we can effectively utilize the benefits of both dynamic and static analysis to uncover interesting findings at an accelerated pace.
The baseline competencies
As noted in the example and cited in NIST SP 800-115, we also need skills in the technologies, systems, environments, programming languages, secure coding practices, vulnerabilities, and tools.
Web technologies
For web technologies, systems, and environment, we can turn to the vulnerability stack [11] that lists architectural components in modern web applications:
- Firewall/proxy/load balancer/web application firewall: These systems typically stand between us and our target application. They can interact with the requests/responses we send or receive, and we must therefore be able to recognize their presence and the impact they can have on our requests and bypass WAFs.
- Web servers and web application servers: Web applications are typically served through web application servers, which forward our request to the code interpreter. Depending on the web server/web application server type, we may have different attack surfaces (such as the well-known tomcat administration pages) or peculiar behavior that can be exploited, such as HTTP Parameter Pollution.
- Proprietary or third-party application code: Proprietary web applications often use a series of third-party libraries or frameworks that may contain interesting vulnerabilities or provide defense APIs that must be used correctly.
- Databases: Nowadays, applications use different types of databases (accessed directly or via Object-Relational Mapping (ORM), such as NoSQL, data lakes, and cloud storage.
- Virtualization systems: Modern, fully scalable web architectures usually use virtualization systems such as Docker, Podman, and similar technologies. Infrastructure as code has its architectural peculiarities, one being how secret values are handled and how they can be leaked.
- Operating systems: If we work on a vulnerability, such as a path traversal, that impacts a filesystem, it is essential to know how a specific filesystem of an operating system works, when we will exploit command execution, and how a specific shell works to escape. Knowledge of the operating system is also crucial in the post-exploitation phase to do further discovery and privilege escalation because an actual attacker might not only stop executing commands as a regular user on the machine where the application runs but also escalate their privileges, becoming root on Linux or
SYSTEM
on Windows.
- Infrastructure and cloud: When we test applications, we must also consider where an architecture is hosted. Suppose we are within the target’s network. In that case, we have several possibilities for lateral movement. In contrast, if we are in the cloud, it changes the activity’s Operational Security (OPSEC). Due to the presence of APIs, we can exploit vulnerabilities such as Server-Side Request Forgery (SSRF) in a new way.
It is important to know the protocols and technologies we utilize, including SSL/TLS, HTTP [12], and the fundamental concepts of web languages such as HTML and JavaScript.
Let’s suppose our goal is to identify vulnerabilities in web3 applications. In such a situation, we need to understand the basic concepts of blockchain and the languages used in smart contracts. For example, if we intend to investigate a smart contract on the Ethereum blockchain or one of its derivatives, familiarity with the programming language used – in this case, Solidity – will be immensely beneficial.
Tools
An important note is tool knowledge as, in this book, we want to focus on manual activities. If we use a tool, we need to know it well, test it first in our lab, and understand its pros and cons. Often, automatic tools such as vulnerability scanners find something simple if the scan goes well or break the application if the scan goes wrong. In contrast, automatic code review tools tend to have many false positives. You get good results only after good tuning.
Also, often, it can be helpful to write your tool to understand a topic better or be able to exploit a vulnerability properly.
We will talk about the basic tools directly in the next chapter.
Vulnerabilities
For a deeper understanding of web vulnerabilities, we can rely on various methodologies, such as those provided by OWASP. As mentioned, the WSTG provides a comprehensive list of vulnerabilities to consider in our discovery.
In this book, we will indeed discuss several vulnerabilities. Each theoretical section of the various scenarios will highlight these vulnerabilities for a better understanding.
The mindset
In this activity, attitude is critical. To borrow from the Socratic paradox, we should begin with the premise that we neither know nor think we know anything. We can’t afford to take anything for granted. For example, if a WAF filters our attacks, we should not assume that the attack is impossible. Similarly, a fully patched application doesn’t preclude the existence of new vulnerabilities. We need to learn how. And we can do it through trial and error, insight, or top-down and bottom-up approaches, as in all learning processes. We need to ask ourselves the right questions, and we need to seek answers through empirical evidence. Naturally, all of this requires time and dedication.
To assist us in this endeavor, we’ve established a set of mindset principles to keep us goal-oriented.
The right mindset
We must never take anything for granted, learn fast, and not stop when confronted with things we don’t know but strive and move forward.
Creativity
Our first principle, creativity, requires us to think outside the box.
Let’s consider exploiting a web application – we aim to make the application perform functions not intended by its developers. For example, we might manipulate a feature meant for photo album uploads to execute server commands – all through a chain of vulnerabilities linked to a PHP deserialization attack triggered by a simple cookie.
Whenever we encounter an input, a parameter, or a specific behavior, we must strive to understand its functionality and explore unconventional ways of using it.
This involves employing lateral and creative thinking.
Curiosity
Our second principle, curiosity, encourages us to question everything persistently.
We should be curious – intrigued to see the outcome when we input unexpected parameters, eager to understand how an object functions, and keen to manipulate it to suit our intentions. As Loyd “The Mentor” Blankenship penned in Phrack issue 7, “My crime is that of curiosity”. [13]
Being curious also means committing to in-depth study. This involves exploring beyond the first pages of a search engine, seeking out primary sources, and delving deep – usually beyond aesthetically pleasing websites to text files that appear antiquated, much like Request for Comments (RFCs), reading the source code when available, or decompiling it.
Commitment
Our third principle, commitment, reminds us to “play hard”.
We must dedicate time to reading, studying, and practicing to satisfy our curiosity. Learning goes beyond just absorbing information; it also entails applying our knowledge, testing it, and refining it until we fully understand every aspect.
It’s a time-consuming process, and our intrinsic passion fuels our dedication. Our commitment entails knowing our craft well and persevering when faced with a notably secure system. Sometimes, the solution is just around the corner. Even years after the first SQL injection was uncovered, we can still discover low-hanging fruits – vulnerabilities relatively easy to find and exploit, even with automated tools.
However, that’s only sometimes the case. We may need to explore many avenues, make numerous attempts, conduct extensive research to identify a vulnerability, and then exert even more effort to exploit it. We’ve often discovered previously unknown vulnerabilities after weeks of analysis, with successful exploitation taking months. We must continue searching for new vulnerabilities within complex environments; our efforts will inevitably be rewarded.