The File Structure Of My Projects
After painstakingly working with a mess of files in various projects, I had to come up with a method to better handle the data. This semester, I took a course named Methodology of Scientific Research (page of the course), and we discussed good practices for handling data. I have been using the suggested structure ever since, and I believe the method mentioned in that course is applicable to those who might be reading this blog. Therefore, I wanted to share how I've been working over the past couple of months.
First, I want to talk about storing and categorizing data. The easiest way to do this is with a folder structure. During the course, there was extensive discussion on how the folder structure should be organized. I think what matters is the structure within the project. The course suggested the file structure also recommended here, and I agree with it. I think this structure is easy to understand for outsiders and practical for users. Here is the nice visualization by Andrés; https://www.dry-lab.org/slides/2023/bsc/images/folder-struct.svg
One change I make is creating a log file in the root location of the project folder, where I write every transaction that happens for that project. This includes discussions, long-term and short-term goals, and everything in between. I keep the log file in Markdown format. While reading it might not be too easy for non-programming people, the styling characters used in that file don't impair the legibility of the text. I have tested it with two biologists, and it worked well so far.
During the course, using Cookiecutter was suggested to me, but I wasn't really into it at first since creating a few folders isn't a time-consuming task. However, in the past few months, I've been switching between projects more often than ever, and I use Jupyter Notebook for working on Python scripts. I was getting tired of handling kernels for each notebook and opening relevant ones. I switched to JupyterLab, which is a better product for that purpose, yet I still had to close all the notebooks and kernels when switching between projects.
I decided to create a Conda environment for each project, allowing me to connect and continue my JupyterLab session from where I left off and to make collaboration easier by enabling us to work with the same packages and versions. In short, I made my own kind of cookie cutter that creates the project folder structure and sets up a Conda environment within that folder for me. Having a separate environment for every project does create a certain file size load, but I believe it is beneficial for the kind of work I do.
I've been optimizing my folder structure for just a few months now, and as an early-career researcher, I have limited experience in these areas. I'm sure I will change and improve my approach over time. I will try to keep this page updated, but to be sure, please refer to my GitHub profile to see if there are newer scripts I utilize in this regard.
For more information in this regard please refer to this beautiful presentation; https://www.dry-lab.org/slides/2023/bsc/#/folder-structure-for-data-projects
Here is the current script I use to create basic structure of the project folder;
in the script what refer as muggel is those whom can't code. In plain English it is asking if you want to create new Coda environment.
# Function to ask user for muggels project preference
ask_for_muggels() {
read -p "Will you be working exclusively with muggels for this project? (y/n, default: y): " answer
case "$answer" in
[Nn]* ) return 1;; # User does not want to work exclusively with muggels
* ) return 0;; # Default to working with muggels (assume 'yes')
esac
}
# Check if project name is provided as argument
if [ $# -eq 0 ]; then
echo "Usage: $0 <project_name>"
exit 1
fi
# Assign project name from the first argument
project_name=$1
# Directories to create
directories=(
"code"
"bib"
"docs"
"results"
"bib"
"extra"
".conda"
)
# Create directories using project name
for dir in "${directories[@]}"
do
mkdir -p ~/ftp/puntcem/iu/"$project_name"/"$dir"
done
# Create a new log.md file in the project directory
touch ~/ftp/puntcem/iu/"$project_name"/log.md
# Ask user if working exclusively with muggels
ask_for_muggels
muggels_only=$?
# If not working exclusively with muggels, proceed with conda environment setup
if [ $muggels_only -ne 0 ]; then
# Create conda environment and accept all prompts with -y flag
conda create -y --prefix ~/ftp/puntcem/iu/"$project_name"/.conda
# Create symlink to activate the conda environment
ln -s ~/ftp/puntcem/iu/"$project_name"/.conda ~/.conda/envs/"$project_name"
# Activate conda environment
conda activate "$project_name"
# Install required packages (example: jupyterlab) and accept all prompts with -y flag
conda install -y -n "$project_name" -c conda-forge jupyterlab
fi
# Change directory to the project directory
cd ~/ftp/puntcem/iu/"$project_name"
# Launch the markdown editor with the log.md file
marktext -n --disable-gpu log.md &
echo "Project setup completed for '$project_name'."