We are often required to log into a site before we can crawl its content. This is usually done through a form where we enter a user name and password, press Enter, and then granted access to previously hidden content. This type of form authentication is often called cookie authorization, as when we authorize, the server creates a cookie that it can use to verify that you have signed in. Scrapy respects these cookies, so all we need to do is somehow automate the form during our crawl.
Handling forms and forms-based authorization
Getting ready
We will crawl a page in the containers web site at the following URL: http://localhost:5001/home/secured. On this page, and links from that page, there is content we would like to scrape...