Omid - Challenge 219

data-challenges
advanced-exercises
🔰 FOr example the highlighted cell showes, the activity 4 can be start after finishing the task 1 and 2.
Published

March 24, 2026

Illustration for Omid - Challenge 219

Challenge Description

🔰 FOr example the highlighted cell showes, the activity 4 can be start after finishing the task 1 and 2.

Solutions

library(tidyverse); library(readxl)
path <- "files/CH-219 Project scheduling.xlsx"
input <- read_excel(path, range = "B2:D11")
test <- read_excel(path, range = "F2:H11")

schedule_tasks <- function(tasks, start_date = Sys.Date()) {
    tasks_df <- tasks |> transmute(
        id = as.character(`Task Name`),
        duration = `Duration (day)`,
        predecessors = str_split(Predecessors, ",") |> map(~ setdiff(.x, "-") |> str_trim())
    )

    task_durations <- setNames(tasks_df$duration, tasks_df$id)
    task_preds <- setNames(tasks_df$predecessors, tasks_df$id)

    topo_order <- {
        visited <- setNames(rep(FALSE, nrow(tasks_df)), tasks_df$id)
        result <- character()
        dfs <- function(node) {
            if (visited[node]) return()
            visited[node] <<- TRUE
            walk(task_preds[[node]], dfs)
            result <<- c(node, result)
        }
        walk(tasks_df$id, dfs)
        rev(result)
    }

    dates <- list()
    for (id in topo_order) {
        preds <- task_preds[[id]]
        start <- if (length(preds) == 0) start_date else max(sapply(preds, \(p) dates[[p]]$end))
        end <- start + task_durations[[id]]
        dates[[id]] <- list(start = start, end = end)
    }

    tibble(
        id = names(dates),
        start_date = map_dbl(dates, "start") |> as.Date(origin = "1970-01-01"),
        end_date = map_dbl(dates, "end") |> as.Date(origin = "1970-01-01") - 1
    ) |> arrange(id)
}

schedule_tasks(input, as.Date("2025-04-01"))
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Parses the text patterns directly instead of relying on manual cleanup

    • Applies the rule iteratively until the output stabilizes

  • Strengths:

    • The R solution stays close to the workbook rule and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the sheet structure and source ranges remain stable.
  • Gem:

    • The strongest part of the solution is choosing the right intermediate representation before shaping the final output.
import pandas as pd
from datetime import datetime, timedelta

path = "CH-219 Project scheduling.xlsx"
input = pd.read_excel(path, usecols="B:D", skiprows=1, nrows=9)
test = pd.read_excel(path, usecols="F:H", skiprows=1, nrows=9)

def schedule_tasks(tasks, start_date=datetime.now().date()):
    tasks = pd.DataFrame({
        'id': tasks['Task Name'].astype(str),
        'duration': tasks['Duration (day)'],
        'predecessors': [[p.strip() for p in str(pred).split(',') if p.strip() != '-'] if not pd.isna(pred) else [] for pred in tasks['Predecessors']]
    })
    
    task_durations = dict(zip(tasks['id'], tasks['duration']))
    task_preds = dict(zip(tasks['id'], tasks['predecessors']))
    
    all_task_ids = set(tasks['id'])
    for task_id, preds in task_preds.items():
        for pred in preds:
            if pred not in all_task_ids:
                raise ValueError(f"Task {task_id} has predecessor {pred} that doesn't exist")
    
    visited = {id: False for id in tasks['id']}
    temp = {id: False for id in tasks['id']} 
    result = []
    
    def dfs(node):
        if visited[node]:
            return
        if temp[node]:
            raise ValueError(f"Cycle detected in task dependencies involving task {node}")
            
        temp[node] = True

        for pred in task_preds[node]:
            dfs(pred)
            
        temp[node] = False
        visited[node] = True
        result.append(node)
    
    for task_id in tasks['id']:
        if not visited[task_id]:
            dfs(task_id)
    
    topo_order = result
    
    dates = {}
    for id in topo_order:
        start = start_date if not task_preds[id] else max(dates[p]['end'] for p in task_preds[id])
        dates[id] = {'start': start, 'end': start + timedelta(days=task_durations[id])}
    
    return pd.DataFrame({
        'id': list(dates.keys()),
        'start_date': [dates[id]['start'] for id in dates],
        'end_date': [dates[id]['end'] - timedelta(days=1) for id in dates]
    }).sort_values('id').reset_index(drop=True)

print(schedule_tasks(input, datetime.strptime("2025-04-01", "%Y-%m-%d").date()))
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Applies the rule iteratively until the output stabilizes

  • Strengths:

    • The Python version follows the same rule in a direct dataframe-oriented implementation.
  • Areas for Improvement:

    • The code assumes the workbook layout remains stable, so any sheet redesign would require small adjustments.
  • Gem:

    • The implementation stays close to the original workbook rule instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate to challenging:

  • It depends on a non-trivial iterative or rule-based transformation.

  • Getting the expected output requires more than one straightforward dataframe step.