Crispo - Excel Challenge 05 2025

Published

February 2, 2025

Challenge Description

Easy Sunday Excel Challenge

⭐Convert crosstab table to row-based table. ⭐Doctors, Patients and Days are repeated based on the number of appointment(s). ⭐e.g. Halley sees Liz 3 times a week on Wed.

Solutions

library(tidyverse)
library(readxl)

path = "files/Ex-Challenge 05 2025.xlsx"
input = read_excel(path, range = "B2:I7")
test  = read_excel(path, range = "K2:M21")

result = input %>%
  pivot_longer(cols = -c(1, 2), names_to = "Appointments", values_to = "value") %>%
  na.omit() %>%
  uncount(value) 

all.equal(result, test, check.attributes = FALSE)
# [1] TRUE

Logic:
- Pivot the Crosstab Table Use pivot_longer() to transform column names (appointment days) into a single “Appointments” column.
- Remove Missing Values Use na.omit() to filter out empty appointment entries.
- Expand Rows Based on Appointment Count Use uncount(value) to repeat rows based on the count of appointments.
Strengths:
- Compact and Readable: Uses tidyverse functions for a clear and structured pipeline.
- Efficient Row Expansion: uncount(value) dynamically expands rows based on appointment counts.
- Handles Missing Values Automatically: na.omit() ensures only meaningful data is retained.
Areas for Improvement:
- Formatting: Consider checking for duplicate separators (e.g., extra spaces or semicolons).
- Scalability: If demand values are floating-point numbers instead of integers, precision issues might arise.
Gem:
- uncount(value) is a powerful function that simplifies row expansion, making the transformation seamless.

import pandas as pd

path = "files/Ex-Challenge 05 2025.xlsx"
input = pd.read_excel(path, usecols="B:I", skiprows=1, nrows=5)
test = pd.read_excel(path, usecols="K:M", skiprows=1, nrows=20)\
    .rename(columns=lambda x: x.replace('.1', ''))\
    .sort_values(by=['Patient', 'Appointments']).reset_index(drop=True)

input_piv = input.melt(id_vars=input.columns[:2], var_name="Appointments", value_name="value")
input_piv = input_piv.dropna()
input_piv = input_piv[input_piv['value'] >= 0]
input_piv = input_piv.loc[input_piv.index.repeat(input_piv['value'])].reset_index(drop=True)
input_piv = input_piv.drop(columns=['value'])
input_piv = input_piv.sort_values(by=['Patient', 'Appointments']).reset_index(drop=True)

print(input_piv.equals(test)) # True

Logic:
- Pivot the Crosstab Table Use .melt() to move column names (appointment days) into a single column.
- Remove Missing Values Use .dropna() to filter out empty appointment entries.
- Expand Rows Based on Appointment Count Use .ld[input_piv.index.repeat(input_piv['value'])] to repeat rows based on appointment count.
- Sort for Consistency Use .sort_values() to match expected output.
Strengths:
- Efficient Ranking: Uses rank(method='dense', ascending=False).astype(int) to ensure consecutive ranking without gaps.2
- Concatenation of Fruits: Uses .groupby('Rank')['Fruit'].agg(' ; '.join) to merge fruit names within the same rank.
- Concise and Vectorized: Uses assign for ranking and groupby for aggregation, ensuring efficient execution.
Areas for Impqrovement:
- Optimize Row Expansion: .repeat(value) can be memory-intensive if the appointment count is high.
- Handle Edge Cases for Zeros Explicitly: Filtering out non-positive values (>= 0) ensures no errors, but a more explicit check would be better.
Gem
- .repeat(value) is a clean and efficient way to expand rows dynamically, avoiding complex loops.

Difficulty Level

This task is moderate:

Requires Data Reshaping: Uses .melt() in Python and pivot_longer() in R to convert wide data into a long format.

Row Expansion is Non-Trivial: Needs .repeat(value) in Python and uncount(value) in R to generate repeated rows dynamically.
Handles Missing Values Dynamically: Ensuring missing or empty values do not affect the transformation adds complexity.