The File Structure Of My Projects

July 03, 2024

The File Structure Of My Projects

After painstakingly working with a mess of files in various projects, I had to come up with a method to better handle the data. This semester, I took a course named Methodology of Scientific Research (page of the course), and we discussed good practices for handling data. I have been using the suggested structure ever since, and I believe the method mentioned in that course is applicable to those who might be reading this blog. Therefore, I wanted to share how I've been working over the past couple of months.

First, I want to talk about storing and categorizing data. The easiest way to do this is with a folder structure. During the course, there was extensive discussion on how the folder structure should be organized. I think what matters is the structure within the project. The course suggested the file structure also recommended here, and I agree with it. I think this structure is easy to understand for outsiders and practical for users. Here is the nice visualization by Andrés; https://www.dry-lab.org/slides/2023/bsc/images/folder-struct.svg

One change I make is creating a log file in the root location of the project folder, where I write every transaction that happens for that project. This includes discussions, long-term and short-term goals, and everything in between. I keep the log file in Markdown format. While reading it might not be too easy for non-programming people, the styling characters used in that file don't impair the legibility of the text. I have tested it with two biologists, and it worked well so far.

During the course, using Cookiecutter was suggested to me, but I wasn't really into it at first since creating a few folders isn't a time-consuming task. However, in the past few months, I've been switching between projects more often than ever, and I use Jupyter Notebook for working on Python scripts. I was getting tired of handling kernels for each notebook and opening relevant ones. I switched to JupyterLab, which is a better product for that purpose, yet I still had to close all the notebooks and kernels when switching between projects.

I decided to create a Conda environment for each project, allowing me to connect and continue my JupyterLab session from where I left off and to make collaboration easier by enabling us to work with the same packages and versions. In short, I made my own kind of cookie cutter that creates the project folder structure and sets up a Conda environment within that folder for me. Having a separate environment for every project does create a certain file size load, but I believe it is beneficial for the kind of work I do.

I've been optimizing my folder structure for just a few months now, and as an early-career researcher, I have limited experience in these areas. I'm sure I will change and improve my approach over time. I will try to keep this page updated, but to be sure, please refer to my GitHub profile to see if there are newer scripts I utilize in this regard.

For more information in this regard please refer to this beautiful presentation; https://www.dry-lab.org/slides/2023/bsc/#/folder-structure-for-data-projects

Here is the current script I use to create basic structure of the project folder;

in the script what refer as muggel is those whom can't code. In plain English it is asking if you want to create new Coda environment.

#!/bin/bash

# Function to ask user for muggels project preference
ask_for_muggels() {
    read -p "Will you be working exclusively with muggels for this project? (y/n, default: y): " answer
    case "$answer" in
        [Nn]* ) return 1;; # User does not want to work exclusively with muggels
        * ) return 0;;      # Default to working with muggels (assume 'yes')
    esac
}

# Check if project name is provided as argument
if [ $# -eq 0 ]; then
    echo "Usage: $0 <project_name>"
    exit 1
fi

# Assign project name from the first argument
project_name=$1

# Directories to create
directories=(
    "code"
    "bib"
    "docs"
    "results"
    "bib"
    "extra"
    ".conda"
)

# Create directories using project name
for dir in "${directories[@]}"
do
    mkdir -p ~/ftp/puntcem/iu/"$project_name"/"$dir"
done

# Create a new log.md file in the project directory
touch ~/ftp/puntcem/iu/"$project_name"/log.md

# Ask user if working exclusively with muggels
ask_for_muggels
muggels_only=$?

# If not working exclusively with muggels, proceed with conda environment setup
if [ $muggels_only -ne 0 ]; then
    # Create conda environment and accept all prompts with -y flag
    conda create -y --prefix ~/ftp/puntcem/iu/"$project_name"/.conda

    # Create symlink to activate the conda environment
    ln -s ~/ftp/puntcem/iu/"$project_name"/.conda ~/.conda/envs/"$project_name"

    # Activate conda environment
    conda activate "$project_name"

    # Install required packages (example: jupyterlab) and accept all prompts with -y flag
    conda install -y -n "$project_name" -c conda-forge jupyterlab
fi

# Change directory to the project directory
cd ~/ftp/puntcem/iu/"$project_name"

# Launch the markdown editor with the log.md file
marktext -n --disable-gpu log.md &

echo "Project setup completed for '$project_name'."

Search This Blog

NitroxHead's Blog

The File Structure Of My Projects