You're reading from AI-Assisted Programming for Web and Machine Learning Improve your development workflow with ChatGPT and GitHub Copilot

Product type Paperback

Published in Aug 2024

Publisher Packt

ISBN-13 9781835086056

Length 602 pages

Edition 1st Edition

Languages

Python

Tools

ChatGPT

Concepts

Artificial Intelligence

Authors (5):

Marina Fernandez

Ajit Jaokar

Anjali Jain

Christoffer Noring

Ayşe Mutlu

+1 more

View More author details

Table of Contents (25) Chapters

Preface

1. It’s a New World, One with AI Assistants, and You’re Invited FREE CHAPTER

2. Prompt Strategy

3. Tools of the Trade: Introducing Our AI Assistants

4. Build the Appearance of Our App with HTML and Copilot

5. Style the App with CSS and Copilot

6. Add Behavior with JavaScript

7. Support Multiple Viewports Using Responsive Web Layouts

8. Build a Backend with Web APIs

9. Augment Web Apps with AI Services

10. Maintaining Existing Codebases

11. Data Exploration with ChatGPT

12. Building a Classification Model with ChatGPT

13. Building a Regression Model for Customer Spend with ChatGPT

14. Building an MLP Model for Fashion-MNIST with ChatGPT

15. Building a CNN Model for CIFAR-10 with ChatGPT

16. Unsupervised Learning: Clustering and PCA

17. Machine Learning with Copilot

18. Regression with Copilot Chat

19. Regression with Copilot Suggestions

20. Increasing Efficiency with GitHub Copilot

21. Agents in Software Development

22. Conclusion

23. Other Books You May Enjoy

24. Index

Prompt strategy for data science

Let’s do a similar thought experiment for data science as we did for web development. We’ll use the presented guidelines “problem breakdown” and “generate prompts,” and just like in the web development section, we’ll draw some general conclusions on the domain and present those as a prompt strategy for data science.

Problem breakdown: predict sales

Let’s say we’re building a machine-learning model to predict sales. At a high level, we understand what the system should do. To solve the problem though, we need to divide it into smaller parts, which in data science usually entails the following components:

Data: The data is the part of the system that stores information. The data can come from many places like databases, web endpoints, static files, and more.
Model: The model is responsible for learning from the data and producing a prediction that’s as accurate as possible. To predict, you need an input that produces one or more outputs as a prediction.
Training: The training is the part of the system that trains the model. Here, you typically have part of your data as training and a part being sample data.
Evaluation: To ensure your model works as intended, you need to evaluate it. Evaluation means taking the data and model and producing a score that indicates how well the model performs.
Visualization: Visualization is the part where you can gain insights valuable for the business via graphs. This part is very important, as it’s the part that’s most visible to the business.

Further breakdown into features/steps for data science

At this point, you’re at too high a level to start writing prompts. We can break it down further by looking at each step:

Data: The data part has many steps, including collecting the data, cleaning it, and transforming it. Here’s how you can break it down:
1. Collect data: The data needs to be collected from somewhere. It could be a database, a web endpoint, a static file, and so on.
2. Clean data: The data needs to be cleaned. Cleaning means removing data that’s not relevant, removing duplicates, and so on.
3. Transform data: The data needs to be transformed. Transformation means changing the data to a format that’s useful for the model.
Training: Just like the data part, the training part has many steps to it. Here’s how you can break it down:
1. Split data: The data needs to be split into training and sample data. The training data is used to train the model and the sample data is used to evaluate the model.
2. Train model: The model needs to be trained. Training means taking the training data and learning from it.
Evaluation: The evaluation part is usually a single step but can be broken down further.

Generate prompts for each step

Note how our breakdown for data science looks a bit different from web development. Instead of identifying features like Add inventory, we instead have a feature like Collect data.

However, we’re on the correct level to author a prompt, so let’s use the Collect data feature as our example:

[Prompt]

Collect data from data.xls and read it into a DataFrame using Pandas library.

[End of prompt]

The preceding prompt is both general and specific at the same time. It’s general in the sense that it tells you to “collect data” but specific in that it specifies a specific library to use and even what data structure (DataFrame). It’s entirely possible that a simpler prompt would have worked for the preceding step like so:

[Prompt]

Collect data from data.xls.

[End of prompt]

This is where it may vary depending on whether you use a tool like ChatGPT or GitHub Copilot.

Identify some basic principles for data science, “a prompt strategy for data science”

Here, we’ve identified some similar principles as in the web development example:

Provide context – filename: A CSV file can have any name. It’s important to specify the name of the file.
Specify how – libraries: There are many ways to load a CSV file, and even though Pandas library is a common choice, it’s important to specify it. There are other libraries to work with and you might need a solution for Java, C#, and Rust, for example, where libraries are named differently.
Iterate: It’s worth iterating on the prompt, rephrasing it, and adding separators like a comma, a colon, and so on.
Be context-aware: Also here, context matters a lot; if you’re working in Notebook, previous cells will be available to GitHub Copilot, previous conversations will be available to ChatGPT, and so on.

As you can see from the preceding guidance, the strategy is very similar for web development. Here we’re also listing “Provide,” “Specify how,” “Iterate,” and “Be context-aware.” The big difference lies in the details. However, there’s an alternate strategy that works in data science and that’s lengthy prompts. Even though we’ve broken down the data science problem into features, we don’t need to write a prompt per feature. Another way of solving it could be to express everything you want to be carried out in one large prompt. Such a prompt could therefore look like so:

[Prompt]

You want to predict sales on the file data.xsl. Use Python and Pandas library. Here are the steps that you should carry out:

Collect data
Clean data
Transform data
Split data
Train model
Evaluation

[End of prompt]

You will see examples in future chapters on data science and machine learning where both smaller prompts as well as lengthier prompts are being used. You decide which approach you want to use.

The rest of the chapter is locked

You're reading from AI-Assisted Programming for Web and Machine Learning Improve your development workflow with ChatGPT and GitHub Copilot

Table of Contents (25) Chapters

Prompt strategy for data science

Problem breakdown: predict sales

Further breakdown into features/steps for data science

Generate prompts for each step

Identify some basic principles for data science, “a prompt strategy for data science”

Unlock this book and the full library FREE for 7 days

Authors (5)

Personalised recommendations for you