Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Vector Search for Practitioners with Elastic

You're reading from   Vector Search for Practitioners with Elastic A toolkit for building NLP solutions for search, observability, and security using vector search

Arrow left icon
Product type Paperback
Published in Nov 2023
Publisher Packt
ISBN-13 9781805121022
Length 240 pages
Edition 1st Edition
Languages
Arrow right icon
Authors (2):
Arrow left icon
Jeff Vestal Jeff Vestal
Author Profile Icon Jeff Vestal
Jeff Vestal
Bahaaldine Azarmi Bahaaldine Azarmi
Author Profile Icon Bahaaldine Azarmi
Bahaaldine Azarmi
Arrow right icon
View More author details
Toc

Table of Contents (17) Chapters Close

Preface 1. Part 1:Fundamentals of Vector Search FREE CHAPTER
2. Chapter 1: Introduction to Vectors and Embeddings 3. Chapter 2: Getting Started with Vector Search in Elastic 4. Part 2: Advanced Applications and Performance Optimization
5. Chapter 3: Model Management and Vector Considerations in Elastic 6. Chapter 4: Performance Tuning – Working with Data 7. Part 3: Specialized Use Cases
8. Chapter 5: Image Search 9. Chapter 6: Redacting Personal Identifiable Information Using Elasticsearch 10. Chapter 7: Next Generation of Observability Powered by Vectors 11. Chapter 8: The Power of Vectors and Embedding in Bolstering Cybersecurity 12. Part 4: Innovative Integrations and Future Directions
13. Chapter 9: Retrieval Augmented Generation with Elastic 14. Chapter 10: Building an Elastic Plugin for ChatGPT 15. Index 16. Other Books You May Enjoy

Introduction to the Enron email dataset (ham or spam)

The Enron dataset is a large collection of email data that has become a staple in the world of text analysis and machine learning. It’s like a vast library, filled with a diverse range of texts that offers a wealth of insights for those who know how to interpret them.

This dataset was originally made public during the legal investigation into Enron Corporation, a US energy company that collapsed in 2001 due to widespread corporate fraud. The dataset contains over 600,000 emails from about 150 users, mostly senior management of Enron, making it one of the only publicly available collections of real emails of its size.

For our purposes, the emails contained in the Enron dataset have been labeled as ham (legitimate) or spam (phishing). This labeling provides a valuable ground truth, allowing us to train and test models for phishing detection. Labeling tells us which emails are safe and which are dangerous, helping us to...

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image