Subscription

Explore Products

Best Sellers

New Releases

Books

Videos

Audiobooks

Learning Hub

Newsletter Hub

Free Learning

You're reading from Hands-On Big Data Modeling Effective database design techniques for data architects and business intelligence professionals

Product type Paperback

Published in Nov 2018

Publisher Packt

ISBN-13 9781788620901

Length 306 pages

Edition 1st Edition

Languages

Python

Tools

Bitcoin

Concepts

Big Data

Authors (3):

James Lee

Tao Wei

Suresh Kumar Mukhiya

View More author details

Table of Contents (17) Chapters

Preface

1. Introduction to Big Data and Data Management FREE CHAPTER

2. Data Modeling and Management Platforms

3. Defining Data Models

4. Categorizing Data Models

5. Structures of Data Models

6. Modeling Structured Data

7. Modeling with Unstructured Data

8. Modeling with Streaming Data

9. Streaming Sensor Data

10. Concept and Approaches of Big-Data Management

11. DBMS to BDMS

12. Modeling Bitcoin Data Points with Python

13. Modeling Twitter Feeds Using Python

14. Modeling Weather Data Points with Python

15. Modeling IMDb Data Points with Python

16. Other Books You May Enjoy

Leave a review - let other readers know what you think

VSM with Lucene

The VSM, or term vector model, is an algebraic model for representing text documents as vectors of identifiers such as index terms. It is used in information filtering, information retrieval, indexing, and relevancy rankings.

In VSM, weights associated with the terms are calculated based on the following two numbers:

Term frequency (TF): How many times a particular term appears in the document
Inverse document frequency (IDF): How important a word is to a document in a collection

VSM is implemented in a lot of open source software, including Apache Lucene, Elasticsearch, Genism, Numpy, Weka, word2vec, and Konstanz Information Miner (KNIME).

Lucene

In this section, we are going to explore the VSM using an...

The rest of the chapter is locked

A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.

Unlock this book and the full library FREE for 7 days

Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of

Start free trial

Renews at €18.99/month. Cancel anytime

Authors (3)

Lee

James Lee is a passionate software wizard working at one of the top Silicon Valley-based start-ups specializing in big data analysis. He has also worked at Google and Amazon. In his day job, he works with big data technologies, including Cassandra and Elasticsearch, and is an absolute Docker geek and IntelliJ IDEA lover. Apart from his career as a software engineer, he is keen on sharing his knowledge with others and guiding them, especially in relation to start-ups and programming. He has been teaching courses and conducting workshops on Java programming / IntelliJ IDEA since he was 21. James also enjoys skiing and swimming, and is a passionate traveler.

See other products by Lee

Wei

Tao Wei is a passionate software engineer who works in a leading Silicon Valley-based big data analysis company. Previously, Tao worked in big IT companies, including IBM and Cisco. He has intensive experience in designing and building distributed, large-scale systems with proven high availability and reliability. Tao has an MS degree in computer science from McGill University and many years' experience as a teaching assistant in a variety of computer science classes. In his spare time, he enjoys reading and swimming, and is a passionate photographer.

See other products by Wei

Kumar Mukhiya

Suresh Kumar Mukhiya is a PhD candidate, currently affiliated to the Western Norway University of Applied Sciences (HVL). He is a big data enthusiast, specializing in Information Systems, Model-Driven Software Engineering, Big Data Analysis, Artificial Intelligence and Frontend development. He has completed a Masters in Information Systems from the Norwegian University of Science and Technology (NTNU, Norway) along with a thesis in processing mining. He also holds a bachelor's degree in computer science and information technology (BSc.CSIT) from Tribhuvan University, Nepal, where he was decorated with the Vice-Chancellor's Award for obtaining the highest score. He is a passionate photographer and a resilient traveler.

See other products by Kumar Mukhiya