Data storage: HDF5

3 minute read

In this lesson we will learn how to store different kind of data on disk. For this purpose we will use JLD.jl, a Julia dialect of HDF5, which is a file format designed to store and organise large amounts of data.

Operating on .jld files

Installing JLD.jl

First of all we need to install JLD, to do it type the following code:

using Pkg
Pkg.add("JLD")

Exporting data

JLD can save to disk almost any form of data, including variables, dictionaries and even concrete types. In order to save some data, we first need to create a dictionary containing a string identifier for each element (the key in the dictionary) and the data. Then we can export that dictionary with the save function. For example, we can do it in this way:

using JLD

x = collect(-3:0.1:3)
y = collect(-3:0.1:3)

xx = reshape([xi for xi in x for yj in y], length(y), length(x))
yy = reshape([yj for xi in x for yj in y], length(y), length(x))
                                
z = sin.(xx .+ yy.^2)

data_dict = Dict("x" => x, "y" => y, "z" => z)

save("data_dict.jld", data_dict)

At line 3-4 we define x and y (remember that collect transforms a range into an array), at line 6-7 we create a grid of x and y to compute all the possible combinations of x and y. At line 9 we compute z and at line 11 we create a dictionary containing the variables that we want to store: x, y and z. At line 13 we export data_dict through the save function to a file called data_dict.jld.

Reading data

In order to demonstrate that the data is actually read from disk, please restart the REPL.

It is possible to read a .jld file through the load function:

using JLD
data_dict2 = load("data_dict.jld")

We can now inspect the content of data_dict2 and perform some operations with the loaded data, for example we can plot it:

x2 = data_dict2["x"]
y2 = data_dict2["y"]
z2 = data_dict2["z"]

using Plots
plotly()

plot(x2, y2, z2, st = :surface, color = :ice)

Structures

It is also possible to store structures in .jld archives, which is done in the following way:

using JLD
struct Person
    height::Float64
    weight::Float64
end

bob = Person(1.84, 74)

dict_new = Dict("bob" => bob)
save("bob.jld", dict_new)

The file is loaded in the same way as before with one exception: the Person structure should be defined before loading the archive. Before running the following code please restart the REPL.

using JLD
struct Person
    height::Float64
    weight::Float64
end
bob2 = load("bob.jld")

>>>bob2["bob"]
Person(1.84, 74.0)

If we restart the REPL and we omit redefining Person, we get the following output:

using JLD

>>>bob3 = load("bob.jld")
Warning: type Person not present in workspace; reconstructing
    
>>>bob3["bob"]
JLD.var"##Person#402"(1.84, 74.0)
    
>>>bob3["bob"].height
1.84

As you can see at line 4, we were able to import the file but we didn’t get a Person structure (line 7), as Person was not known at the time of the import. Nonetheless we can retrieve the data stored inside bob, as shown at line 9.

At the time of writing, it is not possible to store data with units of measurement inside .jld files.

JLD2

In some cases, it might be necessary to store data in a more flexible or performant way. JLD2 is an evolution of JLD which supports more extensively data types natives of Julia. Furthermore, it is implemented 100% in Julia. This package is still in active development. While its features are extremely nice, and it often outperforms JLD and other HDF5 implementations, if stability is your main concern you might want to stick to JLD.

JLD2 has the same interface as JLD, but it exports files in the .jld2 format. For example:

using JLD2
using Unitful

data_dict = Dict{String,Any}()
a = [1,2,3]*u"m"
data_dict["a"]=a
save("data_dict.jld2", data_dict)

data_dict_loaded = load("data_dict.jld2")

data_dict["a"] == data_dict_loaded["a"]

Conclusions

In this lesson we have learned how it is possible to store and retrieve data using JLD and JLD2. Moreover, in the case of structures, we have seen that it is better to define the desired structure before importing the data.

If you liked this lesson and you would like to receive further updates on what is being published on this website, I encourage you to subscribe to the newsletter! If you have any question or suggestion, please post them in the discussion below!

Thank you for reading this lesson and see you soon on TechyTok!

Leave a comment