dlc2action.project
Project interface
The most convenient way to use dlc2action
is through the high-level project interface. It is defined in the
project
module and its main functions are managing configuration files and keeping track of experiments.
When you create a project.Project
instance with a previously unused name, it generates a new project folder with results,
history and configuration files.
.
project_name
├── config
├── meta
├── saved_datasets
└── results
├── logs
│ └── episode.txt
├── models
│ └── episode
│ ├── epoch25.pt
│ └── epoch50.pt
├── searches
│ └── search
│ ├── search_param_importances.html_docs
│ └── search_contour.html_docs
├── splits
│ ├── time_25.0%validation_10.0%test.txt
│ └── random_20.0%validation_10.0%test.txt
├── suggestions
│ └── active_learning
│ ├── video1_suggestion.pickle
│ ├── video2_suggestion.pickle
│ └── al_points.pickle
└── predictions
├── episode_epoch25.pickle
└── episode_epoch50_newdata.pickle
Here is an explanation of this structure.
The config folder contains .yaml configuration files. Project instances can read them into a parameter dictionary and update. Those readers understand several blanks for certain parameters that can be inferred from the data on runtime:
'dataset_features'
will be replaced with the shape of features per frame in the data,'dataset_classes'
will be replaced with the number of classes,'dataset_inverse_weights'
at losses.yaml will be replaced with a list of float values that are inversely'dataset_len_segment'
will be replaced with the length of segment in the data,'model_features'
will be replaced with the shape of features per frame in the model feature extraction output (the input to SSL modules). proportional to the number of frames labeled with the corresponding classes.
Pickled history files go in the meta folder. They are all pandas dataframes that store the relevant task
parameters, a summary of experiment results (where applicable) and some meta information, like additional
parameters or the time when the record was added. There are separate files for the history of training episodes,
hyperparameter searches, predictions, saved datasets and active learning file generations. The classes that handle
those files are defined at the meta
module.
When a dataset is generated (the features are extracted and cut), it is saved in the saved_datasets folder. Every time you create a new task, Project will check the saved dataset records and load pre-computed features if they exist. You can always safely clean the datasets to save space with the remove_datasets() function.
Everything else is stored in the results folder. The text training log files go into the logs subfolder. Model
checkpoints (with 'model_state_dict'
, 'optimizer_state_dict'
and 'epoch'
keys) are saved in the models
subfolder. The main results of hyperparameter searches (best parameters and best values) are kept in the meta files
but they also generate html_docs plots that can be accessed in the searches subfolder. Split text files can be found
in the splits subfolder. They are also checked every time you create a task and if a split with the same
parameters already exists it will be loaded. Active learning files are saved in the suggestions subfolder.
Suggestions for each video are named {video_id}_suggestion.pickle and the active learning file is always
al_points.pickle. Finally, prediction files (pickled dictionaries) are stored in the predictions subfolder.
1# 2# Copyright 2020-2022 by A. Mathis Group and contributors. All rights reserved. 3# 4# This project and all its files are licensed under GNU AGPLv3 or later version. A copy is included in dlc2action/LICENSE.AGPL. 5# 6""" 7## Project interface 8 9The most convenient way to use `dlc2action` is through the high-level project interface. It is defined in the 10`project` module and its main functions are managing configuration files and keeping track of experiments. 11When you create a `project.Project` instance with a previously unused name, it generates a new project folder with results, 12history and configuration files. 13 14``` 15. 16project_name 17├── config 18├── meta 19├── saved_datasets 20└── results 21 ├── logs 22 │ └── episode.txt 23 ├── models 24 │ └── episode 25 │ ├── epoch25.pt 26 │ └── epoch50.pt 27 ├── searches 28 │ └── search 29 │ ├── search_param_importances.html_docs 30 │ └── search_contour.html_docs 31 ├── splits 32 │ ├── time_25.0%validation_10.0%test.txt 33 │ └── random_20.0%validation_10.0%test.txt 34 ├── suggestions 35 │ └── active_learning 36 │ ├── video1_suggestion.pickle 37 │ ├── video2_suggestion.pickle 38 │ └── al_points.pickle 39 └── predictions 40 ├── episode_epoch25.pickle 41 └── episode_epoch50_newdata.pickle 42``` 43 44Here is an explanation of this structure. 45 46The **config** folder contains .yaml configuration files. Project instances can read them into a parameter dictionary 47and update. Those readers understand several blanks for certain parameters that can be inferred from the data on 48runtime: 49 50* `'dataset_features'` will be replaced with the shape of features per frame in the data, 51* `'dataset_classes'` will be replaced with the number of classes, 52* `'dataset_inverse_weights'` at losses.yaml will be replaced with a list of float values that are inversely 53* `'dataset_len_segment'` will be replaced with the length of segment in the data, 54* `'model_features'` will be replaced with the shape of features per frame in the model feature extraction 55 output (the input to SSL modules). 56proportional to the number of frames labeled with the corresponding classes. 57 58Pickled history files go in the **meta** folder. They are all pandas dataframes that store the relevant task 59parameters, a summary of experiment results (where applicable) and some meta information, like additional 60parameters or the time when the record was added. There are separate files for the history of training episodes, 61hyperparameter searches, predictions, saved datasets and active learning file generations. The classes that handle 62those files are defined at the `meta` module. 63 64When a dataset is generated (the features are extracted and cut), it is saved in the **saved_datasets** folder. Every 65time you create a new task, Project will check the saved dataset records and load pre-computed features if they 66exist. You can always safely clean the datasets to save space with the remove_datasets() function. 67 68Everything else is stored in the *results* folder. The text training log files go into the **logs** subfolder. Model 69checkpoints (with `'model_state_dict'`, `'optimizer_state_dict'` and `'epoch'` keys) are saved in the **models** 70subfolder. The main results of hyperparameter searches (best parameters and best values) are kept in the meta files 71but they also generate html_docs plots that can be accessed in the **searches** subfolder. Split text files can be found 72in the **splits** subfolder. They are also checked every time you create a task and if a split with the same 73parameters already exists it will be loaded. Active learning files are saved in the **suggestions** subfolder. 74Suggestions for each video are named *{video_id}_suggestion.pickle* and the active learning file is always 75*al_points.pickle*. Finally, prediction files (pickled dictionaries) are stored in the **predictions** subfolder. 76""" 77 78from dlc2action.project.project import * 79from dlc2action.project.meta import *