dlc2action.project

Project interface

The most convenient way to use dlc2action is through the high-level project interface. It is defined in the project module and its main functions are managing configuration files and keeping track of experiments. When you create a project.Project instance with a previously unused name, it generates a new project folder with results, history and configuration files.

.
project_name
├── config
├── meta
├── saved_datasets
└── results
    ├── logs
    │   └── episode.txt
    ├── models
    │   └── episode
    │       ├── epoch25.pt
    │       └── epoch50.pt
    ├── searches
    │   └── search
    │       ├── search_param_importances.html_docs
    │       └── search_contour.html_docs
    ├── splits
    │       ├── time_25.0%validation_10.0%test.txt
    │       └── random_20.0%validation_10.0%test.txt
    ├── suggestions
    │       └── active_learning
    │           ├── video1_suggestion.pickle
    │           ├── video2_suggestion.pickle
    │           └── al_points.pickle
    └── predictions
            ├── episode_epoch25.pickle
            └── episode_epoch50_newdata.pickle

Here is an explanation of this structure.

The config folder contains .yaml configuration files. Project instances can read them into a parameter dictionary and update. Those readers understand several blanks for certain parameters that can be inferred from the data on runtime:

  • 'dataset_features' will be replaced with the shape of features per frame in the data,
  • 'dataset_classes' will be replaced with the number of classes,
  • 'dataset_inverse_weights' at losses.yaml will be replaced with a list of float values that are inversely
  • 'dataset_len_segment' will be replaced with the length of segment in the data,
  • 'model_features' will be replaced with the shape of features per frame in the model feature extraction output (the input to SSL modules). proportional to the number of frames labeled with the corresponding classes.

Pickled history files go in the meta folder. They are all pandas dataframes that store the relevant task parameters, a summary of experiment results (where applicable) and some meta information, like additional parameters or the time when the record was added. There are separate files for the history of training episodes, hyperparameter searches, predictions, saved datasets and active learning file generations. The classes that handle those files are defined at the meta module.

When a dataset is generated (the features are extracted and cut), it is saved in the saved_datasets folder. Every time you create a new task, Project will check the saved dataset records and load pre-computed features if they exist. You can always safely clean the datasets to save space with the remove_datasets() function.

Everything else is stored in the results folder. The text training log files go into the logs subfolder. Model checkpoints (with 'model_state_dict', 'optimizer_state_dict' and 'epoch' keys) are saved in the models subfolder. The main results of hyperparameter searches (best parameters and best values) are kept in the meta files but they also generate html_docs plots that can be accessed in the **searches** subfolder. Split text files can be found in the **splits** subfolder. They are also checked every time you create a task and if a split with the same parameters already exists it will be loaded. Active learning files are saved in the **suggestions** subfolder. Suggestions for each video are named *{video_id}_suggestion.pickle* and the active learning file is always *al_points.pickle*. Finally, prediction files (pickled dictionaries) are stored in the predictions subfolder.

 1#
 2# Copyright 2020-present by A. Mathis Group and contributors. All rights reserved.
 3#
 4# This project and all its files are licensed under GNU AGPLv3 or later version. 
 5# A copy is included in dlc2action/LICENSE.AGPL.
 6#
 7"""
 8## Project interface
 9
10The most convenient way to use `dlc2action` is through the high-level project interface. It is defined in the
11`project` module and its main functions are managing configuration files and keeping track of experiments.
12When you create a `project.Project` instance with a previously unused name, it generates a new project folder with results,
13history and configuration files.
14
15```
16.
17project_name
18├── config
19├── meta
20├── saved_datasets
21└── results
22    ├── logs
23    │   └── episode.txt
24    ├── models
25    │   └── episode
26    │       ├── epoch25.pt
27    │       └── epoch50.pt
28    ├── searches
29    │   └── search
30    │       ├── search_param_importances.html_docs
31    │       └── search_contour.html_docs
32    ├── splits
33    │       ├── time_25.0%validation_10.0%test.txt
34    │       └── random_20.0%validation_10.0%test.txt
35    ├── suggestions
36    │       └── active_learning
37    │           ├── video1_suggestion.pickle
38    │           ├── video2_suggestion.pickle
39    │           └── al_points.pickle
40    └── predictions
41            ├── episode_epoch25.pickle
42            └── episode_epoch50_newdata.pickle
43```
44
45Here is an explanation of this structure.
46
47The **config** folder contains .yaml configuration files. Project instances can read them into a parameter dictionary
48and update. Those readers understand several blanks for certain parameters that can be inferred from the data on
49runtime:
50
51* `'dataset_features'` will be replaced with the shape of features per frame in the data,
52* `'dataset_classes'` will be replaced with the number of classes,
53* `'dataset_inverse_weights'` at losses.yaml will be replaced with a list of float values that are inversely
54* `'dataset_len_segment'` will be replaced with the length of segment in the data,
55* `'model_features'` will be replaced with the shape of features per frame in the model feature extraction
56    output (the input to SSL modules).
57proportional to the number of frames labeled with the corresponding classes.
58
59Pickled history files go in the **meta** folder. They are all pandas dataframes that store the relevant task
60parameters, a summary of experiment results (where applicable) and some meta information, like additional
61parameters or the time when the record was added. There are separate files for the history of training episodes,
62hyperparameter searches, predictions, saved datasets and active learning file generations. The classes that handle
63those files are defined at the `meta` module.
64
65When a dataset is generated (the features are extracted and cut), it is saved in the **saved_datasets** folder. Every
66time you create a new task, Project will check the saved dataset records and load pre-computed features if they
67exist. You can always safely clean the datasets to save space with the remove_datasets() function.
68
69Everything else is stored in the *results* folder. The text training log files go into the **logs** subfolder. Model
70checkpoints (with `'model_state_dict'`, `'optimizer_state_dict'` and `'epoch'` keys) are saved in the **models**
71subfolder. The main results of hyperparameter searches (best parameters and best values) are kept in the meta files
72but they also generate html_docs plots that can be accessed in the **searches** subfolder. Split text files can be found
73in the **splits** subfolder. They are also checked every time you create a task and if a split with the same
74parameters already exists it will be loaded. Active learning files are saved in the **suggestions** subfolder.
75Suggestions for each video are named *{video_id}_suggestion.pickle* and the active learning file is always
76*al_points.pickle*. Finally, prediction files (pickled dictionaries) are stored in the **predictions** subfolder.
77"""
78
79from dlc2action.project.project import *
80from dlc2action.project.meta import *