You're reading from Mastering Julia Enhance your analytical and programming skills for data modeling and processing with Julia

Product type Paperback

Published in Jan 2024

Publisher Packt

ISBN-13 9781805129790

Length 506 pages

Edition 2nd Edition

Languages

Julia

Concepts

Programming Language

Author (1):

Malcolm Sherrington

View More author details

Table of Contents (14) Chapters

Preface

1. Chapter 1: The Julia Environment

2. Chapter 2: Developing in Julia FREE CHAPTER

3. Chapter 3: The Julia Type System

4. Chapter 4: The Three Ms

5. Chapter 5: Interoperability

6. Chapter 6: Working with Data

7. Chapter 7: Scientific Programming

8. Chapter 8: Visualization

9. Chapter 9: Database Access

10. Chapter 10: Networks and Multitasking

11. Chapter 11: Julia’s Back Pages

12. Index

Why subscribe?

13. Other Books You May Enjoy

Data arrays and data frames

Users of R will be aware of the success of data frames when employed in analyzing datasets, a success that has been mirrored by Python with the pandas package.

Julia too adds data frame support through the use of a DataFrames package.

The package extends Julia’s base by introducing three basic types, as follows:

Missing.missing: An indicator that a data value is missing
DataArray: An extension to the Array type that can contain missing values
DataFrame: A data structure for representing tabular datasets

It is such a large topic that we will be looking at data frames in some depth when we consider statistical computing.

However, here’s some code to get a flavor of processing data with these packages:

julia> using DataFrames
julia> df1 = DataFrame(ID = 1:4,
                       Cost = [10.1,7.9,missing,4.5])
4 ×2 DataFrame
│ Row │ ID │ Cost    │
├─────┼────┼─────────┤
│  1  │  1 │ 10.1    │
│  2  │  2 │ 7.9     │
│  3  │  3 │ missing │
│  4  │  4 │ 4.5     │

Common operations include computing mean(d) or var(d) of the Cost because of the missing value in row 3:

julia> using Statistics
julia> mean(!, df1[:Cost])
missing

We can create a new data frame by dropping ALL rows with missing values, and now statistical functions can be applied as normal:

julia> df2 = dropmissing(df1). << This might have changed ??? >>>
3 ×2 DataFrames.DataFrame
│ Row │ ID │ Cost │
├─────┼────┼──────┤
│  1  │  1 │ 10.1 │
│  2  │  2 │ 7.9  │
│  3  │  4 │ 4.5  │
julia> (μ,σ) = (mean(df2[!,:Cost]),std(df2[!,:Cost]))
(7.5, 2.8213471959331766)

We will cover data frames in much greater detail when we consider data I/O in Chapter 6.

At this time, we will look at the Tables API, implemented in the Tables.jl file, which is used by a large number of packages.

You're reading from Mastering Julia Enhance your analytical and programming skills for data modeling and processing with Julia

Table of Contents (14) Chapters

Data arrays and data frames

Authors (1)

Personalised recommendations for you