Mirko Bronzi

Deep Learning Project Template (Cookiecutter)

2021-07-29T00:00:00+00:00

TL;DR

Starting a new Deep Learning (DL) project usually means deciding between a flexible but less long term-scalable approach(e.g., Google Colab), or a more organized one that requires a decent amount of time/work to set up properly (in particular when we plan to use several tools that should interact with one-another).

For our new DL projects, we created a project template (Cookiecutter) that can be instantiated in minutes, providing code that runs out of the box. The instantiated project contains tools for Deep Learning (PyTorch, PyTorch Lightning, TensorFlow, Keras), tools to help running experiments (MLflow, Orion), and tools that help with code best practices (pytest, Continuous Integration, documentation management with Sphinx).

What is it

A project template (sometimes called Cookiecutter) is just a template that gets instantiated (usually providing various options to change the project setup) to bootstrap the project code/environment. This helps save a lot of time (in particular if the team starts new projects often).

Our project template prepare a setup that can be used for Deep Learning projects. In particular, it provides all the necessary files/folders that will take care of the boilerplate code necessary to run experiments. Also, a README file is provided with the final instructions to complete the setup, and with instructions on how to perform training, running tests, etc.

The goal is to provide all the tools to take care of the research part, and also all the necessary tools to help the developer to implement best practices (to keep the project code manageable over time).

Deep Learning setup

The code is based on either PyTorch Lightning or Keras (this is a choice left to the developer). The logging will be performed using MLflow, and a hyper-parameter search can be performed using Orion. The code is already setup to use those libraries.

Code best practices

A good strategy to help implement best practices is to use automatic checks for code/documentation, and to write tests to check that the code logic works. It is a good habit to run all those checks every time that the code is pushed to the server. This can be done by a human, but it is better done using a continuous integration (CI) process. To help with this, the instantiated project supports CI tools such as GitHub actions, Azure and Travis. In fact, the related configuration files are already provided and ready to run flake8 to check for code format, Sphinx to check for proper documentation, and pytest to run the tests.

How to use

The project template is instantiated with a simple command:

pip install -U cookiecutter
cookiecutter https://github.com/mila-iqia/cookiecutter-pyml.git

This command will ask some questions and instantiate the project skeleton. In particular, it will ask whether the DL backbone should be PyTorch (with PyTorch Lightning) or TensorFlow (with Keras).

After the instantiation is done, the developer will find a working project that can be run (the README file - in the project itself - will contain the last steps in order to complete the setup, such as installing the dependencies). This is because the code includes some synthetic data, a data loader, and a model, all based on a toy task (i.e., given a sequence of numbers, compute the sum).

This is a tree view of an initialized repository:

.
├── LICENSE
├── README.md
├── config
│   └── hooks
│       └── pre-commit
├── docs
│   ├── ...
├── examples
│   ├── data
│   │   ├── ...
│   ├── local
│   │   ├── config.yaml
│   │   └── run.sh
│   ├── local_orion
│   │   ├── config.yaml
│   │   ├── orion_config.yaml
│   │   └── run.sh
│   ├── slurm
│   │   ├── ...
│   └── slurm_orion
│       ├── ...
├── setup.py
├── tests
│   └── test_hp_utils.py
└── wonderful_project
    ├── __init__.py
    ├── data
    │   ├── __init__.py
    │   └── data_loader.py
    ├── main.py
    ├── models
    │   ├── __init__.py
    │   ├── model_loader.py
    │   ├── my_model.py
    │   └── optim.py
    ├── train.py
    └── utils
        ├── __init__.py
        ├── file_utils.py
        ├── hp_utils.py
        ├── logging_utils.py
        └── reproducibility_utils.py

Of course, it’s the developer’s job to change those elements to address their task of interest.

To run the model provided for the toy tasks, it is enough to just:

cd examples/local
sh run.sh

Or, if on a cluster that support Slurm, then the command is:

cd examples/slurm
sh run.sh

This will train the models (available under output), print the log on the screen (or in a file, if using Slurm ) and generate the MLflow plots (under mlruns). More details and instructions are available in the README file contained in the instantiated project itself - enjoy!

Data Structures and Performances: Lists

2011-07-18T00:00:00+00:00

The goal of this post is to see how a contiguous-memory structure (arrays) compares to a pointer-based one (linked lists).

Recap

Let me start by recapping the difference between an array-based list (also called dynamic array) and a linked list. The first one is based on the array concept, i.e., store elements contiguously in memory, preserving a given order; the second one uses pointers to keep track of the order of the elements (see here for a quick comparison between linked list and dynamic array).

These different ways of implementing a list provide different performances: roughly speaking, array list requires less resources to read elements, and more resources to write elements (adding or removing), vice-versa for the linked list.

Read/Write performances

In this post we are going to validate this assertion by experimentally comparing the performances of these data structures. We are going to use the Java language, and the ArrayList and LinkedList class.

We start by comparing a simple read operation (get an element from the middle of a list), and a simple write operation (add an element to the end of the list). The results are as following:

(x: size of list, y: time in ms)

The “read” operation (get the median element) is much faster using an array list. In fact, getting an element from an array requires constant time. On the other hand, we would expect the linked list to be faster when adding elements. This is actually not true (as shown in the second graph): we can explain this considering that adding an element at the end of a list is a simple append operation which requires no shifting of the other elements even if the array list has to copy itself when making more spaces for more elements. It is also true that this happens few times. In particular, if the array doubles its capacity every time, it has to do it only log(n) times. In fact, we start from capacity 2, then 4, then 8…till n. The linked list has to deal with pointers, which is always an added “burden” (in terms of space and time)

In order to validate our assumption about linked list (i.e., linked list perform better when dealing with “write” operation), we can try to recreate a scenario where the array list performs worse due to its “less flexible” structure. We can do this by planning an experiment where we add elements in the middle of the list (which requires the array list to shift all the following elements to the right, while the linked list can simply change a pointer)

(x: size of list, y: time in ms)

Again, the linked list performs worse. This is because before inserting an element, we have to find the median element, and as we’ve seen, this is not so easy for a linked structure. So, we could try removing this problem by adding the element to the beginning of the list; this way the shifting problem for the array list should remain, while the linked list should be able to access the element (the first one) in constant time

(x: size of list, y: time in ms)

In fact, now the linked list performs much better.

Summarizing: it is true that a linked list perform better when dealing with “write” operation, but it is also true that when accessing elements far from the list start, the element access operation comes with a high cost, because we must follow the chain of pointers. It is important to highlight that this is not true for the last elements, in fact the linked list implementation is (usually) “smart” enough to start the element research from the nearest endpoint; as we can see from the following graph, the worst cases for linked list element access are in the list middle

(x: element index; y: time in ms)

We could consider the linked list “not so useful” because they perform almost always worse than an array list. As a matter of fact, the only scenario where they perform better is when dealing with elements at the beginning and at the end of the list. Actually, this scenario is very common; in fact, when working with stack, we are only interested in getting elements from the first position, while with queue we are only interested in the first and last position. Because of this, and because of stack and queue importance, the linked lists are valuable data structure.