Connecting to data sources
By this point, you should have a list of data sources and an idea of what data is stored there. Depending on your use case, these sources could be real-time data streaming sources you need to tap into. Here are some typical sources of data:
- Filesystems
- Excel files
- SQL databases
- Amazon S3 buckets
- Hadoop Distributed File System (HDFS)
- NoSQL databases
- Data warehouses
- Data lakes
- Graph databases
- Data streams
Depending on the type of data source, you will use different mechanisms to access this data. These could be on-premises or in the cloud. Depending on the condition of the data, you can bring it directly into DataRobot, or you might have to do some preparation before you bring it into DataRobot. DataRobot has recently added capabilities in the form of Paxata to help with this process, but you might not have access to that add-on. Most of the processing work is done via SQL, Python, pandas, and Excel. For the...