To train our machine learning models to find malware datasets, there are a lot of publicly available sources for data scientists and malware analysts. For example, the following websites give security researchers and machine learning enthusiasts the ability to download many different malware samples:
- Malware-Traffic-Analysis: https://www.malware-traffic-analysis.net/
- Kaggle Malware Families: https://www.kaggle.com/c/malware-classification
- VX Heaven: http://83.133.184.251/virensimulation.org/index.html
- VirusTotal: https://www.virustotal.com
- VirusShare: https://virusshare.com
To work with PE files, I highly recommend using an amazing Python library called pefile. pefile gives you the ability to inspect headers, analyze sections, and retrieve data, in addition to other capabilities, like packer detection and PEiD signature generation...