library(tidyverse)
library(readxl)
path <- "Excel/900-999/902/902 Total Hours.xlsx"
input <- read_excel(path, range = "A2:C36")
test <- read_excel(path, range = "E2:F6")
result = input %>%
separate(Timing, into = c("Start", "End"), sep = "-") %>%
mutate(
Start = as.POSIXct(Start, format = "%H:%M"),
End = as.POSIXct(End, format = "%H:%M"),
Duration = as.numeric(difftime(
if_else(End < Start, End + lubridate::days(1), End),
Start,
units = "hours"
)),
Seniority = ifelse(
as.numeric(str_extract(EmpID, "\\d+")) > 150,
TRUE,
FALSE
)
) %>%
mutate(
`Total Hours` = Duration *
ifelse(Seniority, 1.2, 1) *
case_when(
Category == "ALPHA" ~ 1,
Category == "BETA" ~ 1.5,
Category == "GAMMA" ~ 2
)
) %>%
summarise(`Total Hours` = sum(`Total Hours`), .by = Category) %>%
janitor::adorn_totals("row")
all.equal(result, test, check.attributes = FALSE)
# [1] TRUEExcel BI - Excel Challenge 902
excel-challenges
excel-formulas
🔰 Calculate the Duration for Different categories.

Challenge Description
🔰 Calculate the Duration for Different categories. If numeric portion of EmpID is >150, then duration will be 20% more as a seniority surcharge. Also duration will be multiplied by 1.5 for Beta category and 2 for Gamma category.
Solutions
- Logic: Read the workbook ranges needed for the challenge; Derive the required intermediate columns; Parse the packed text or string structure; Aggregate or rank the data at the required grouping level.
- Strengths: The solution stays close to the text pattern itself, which makes the extraction logic easy to audit.
- Areas for Improvement: The solution assumes the workbook layout and selected ranges remain stable, so any structural change in the sheet would require small adjustments.
- Gem: A small number of well-targeted text patterns does most of the heavy lifting.
import pandas as pd
import numpy as np
from datetime import timedelta
path = "Excel/900-999/902/902 Total Hours.xlsx"
input_data = pd.read_excel(path, usecols="A:C", skiprows=1, nrows=35)
test = pd.read_excel(path, usecols="E:F", skiprows=1, nrows=4).rename(columns=lambda x: x.replace(".1", ""))
test['Total Hours'] = test['Total Hours'].round(2)
input_data[['Start', 'End']] = input_data['Timing'].str.split('-', expand=True)
input_data['Start'] = pd.to_datetime(input_data['Start'], format="%H:%M")
input_data['End'] = pd.to_datetime(input_data['End'], format="%H:%M")
input_data['Duration'] = input_data.apply(
lambda row: (row['End'] + timedelta(days=1) if row['End'] < row['Start'] else row['End']) - row['Start'], axis=1
).dt.total_seconds() / 3600
input_data['Total Hours'] = input_data['Duration'] * \
np.where(input_data['EmpID'].str.extract(r'(\d+)')[0].astype(int) > 150, 1.2, 1) * \
input_data['Category'].map({"ALPHA": 1, "BETA": 1.5, "GAMMA": 2})
result = input_data.groupby('Category')['Total Hours'].sum().reset_index()
result.loc[len(result)] = ['Total', result['Total Hours'].sum()]
result['Total Hours'] = result['Total Hours'].round(2)
print(test.equals(result))The Python version follows the same grouped logic and keeps the transformation explicit in a dataframe pipeline.
Difficulty Level
Medium
The individual steps are manageable, but the correct transformation pattern is not obvious from the raw data.