Scenario
You are the owner of an e-commerce website that sells a wide range of products, from electronics to fashion items. Your website has been operational for a few years, and over time, you have managed to build a sizable customer base. However, you have noticed that your conversion rates (the percentage of visitors who make a purchase) have been stagnant, and you are unsure of the reasons behind this trend.
To improve your conversion rates and enhance the overall user experience on your website, you have decided to analyze the clickstream data generated by your users. Clickstream data refers to the record of user interactions and activities on your website, such as the pages visited, the links clicked, the products viewed, and the time spent on each page.
By analyzing this data, you aim to gain valuable insights into user behavior and preferences, which can help you identify potential bottlenecks, pain points, and areas for improvement in your website’s user experience. Additionally, you hope to uncover patterns and trends that can guide your marketing and product development strategies, leading to increased conversions and revenue.
Requirements
As you have done until this part, gathering the requirements should be the starting point. Considering your current scenario, your objective is to analyze the clickstream data, and there are two different profiles with diverse backgrounds and technical skills that want to extract information from your data:
- Business analysts: Business analysts do not feel confident creating and running SQL queries and look for a more visual alternative to explore the data.
- Technical users: Technical users are familiar and comfortable with SQL and want to be able to run complex queries to answer specific questions.
Knowing the profiles and understanding how they will interact with your application is key for defining the functional and non-functional requirements.
Functional requirements
Functional requirements outline the essential features, functions, and capabilities that the proposed solution must deliver to meet the desired objectives. In this case, the functional requirements should do the following:
- Extract information from clickstream data
- Support geographic analysis and user distribution
- Support on-demand updates to get the most up-to-date information
- Have the ability to run ad hoc SQL queries for technical users
- Have the capacity to store the clickstream data indefinitely
Non-functional requirements
Non-functional requirements describe the qualitative characteristics and constraints that the proposed solution should adhere to, ensuring its overall quality and performance. In this specific case, the non-functional requirements stipulate the following:
- Limited maintenance effort
- Cost-effectiveness
Architecture patterns
The AWS Architecture Center (https://aws.amazon.com/architecture/) offers a set of vetted solutions developed and built by experts from both AWS and AWS Partners, which can be considered as the starting point for your projects. For this use case, AWS has available a solution from the AWS Solutions Library called Clickstream Analytics on AWS: https://aws.amazon.com/solutions/implementations/clickstream-analytics-on-aws/. This solution focuses on the collection, ingestion, analysis, and visualization of clickstream data from websites and mobile applications, which is in line with this project’s scope.
The blueprints and solutions should be fully functional due to the regular revisions conducted by AWS, but often each project has its own peculiarities, and they may require some level of customization. Regardless, these are always valuable resources and will save some time whenever you are starting a project from scratch, namely during the architecture design phase.
Architecture
As you have been doing since the first project of this book, you adopt a top-bottom approach, starting with the requirements, which should be completely agnostic and describe the functionalities and constraints you have in your application, down to the specific services that support it.
Your business intelligence application can be decomposed into three layers, as shown in Figure 8.1:
Figure 8.1 – Business intelligence application layers
Let us briefly go over the details:
- Ingestion layer: This collects and imports data from various sources into the system.
- Processing layer: This prepares, cleanses, and transforms the data for analysis.
- Visualization layer: This presents the processed data in a visually appealing and interactive manner for exploration and insights.
Comprehending the layers’ purposes and their respective roles in contributing to the overall application is key for defining the suitable services to be integrated within them.
Simplicity and a minimum amount of maintenance effort are the two main pillars of your architecture considering the two types of personas you want to serve. You will need two ways of interacting with or exploring the data. Moreover, you don’t want to duplicate the data and create different sources for each of the profiles that are interacting with the clickstream data; so, for the data store, you want to adopt a solution that is versatile enough to integrate with both a SQL-like engine for exploration and a business intelligence solution for visualization.
After some research, you start exploring Amazon QuickSight for the visualization layer, and Amazon Athena for running SQL queries over your clickstream data. To store the data, you decide to go with Amazon S3 since it integrates with both Amazon Athena and Amazon QuickSight and can be used without any constraints regarding the number of items to store. For the data transformations, you want a solution that does not require any infrastructure maintenance, and ideally, a tool where you could leverage your current Spark knowledge; so, you have chosen AWS Glue for your data pipelines.
Considering all the points mentioned, you produce an initial architecture, as shown in Figure 8.2. In summary, files are ingested into the /raw
prefix of the bucket and processed by Glue into the /results
prefix of the same bucket. Technical users query this data using Athena with SQL-like syntax, while business users get their insights from QuickSight in a visual format.
Figure 8.2 – AWS architecture for your business intelligence application
Now that you have outlined the high-level architecture, let us dive into the AWS services you have chosen to build this solution and understand how they align with the requirements that were previously established.