Omid - Challenge 46

data-challenges
advanced-exercises
🔰 Numbers Grouping!
Published

March 24, 2026

Illustration for Omid - Challenge 46

Challenge Description

🔰 Numbers Grouping!

Solutions

library(tidyverse)
library(readxl)

input = read_excel("files/CH-046 Numbers Cleaning.xlsx", range = "B2:B16")
test  = read_excel("files/CH-046 Numbers Cleaning.xlsx", range = "J2:J7")

result = input %>%
  arrange(`Product ID`) %>%
  mutate(group = cumsum(`Product ID` - lag(`Product ID`, default = 0) > 1)) %>%
  summarise(`Product ID` = if_else(max(`Product ID`) == min(`Product ID`), 
                                   as.character(max(`Product ID`)), 
                                   paste(min(`Product ID`), str_sub(max(`Product ID`),3,4), sep = "-")), .by = group) %>%
  select(`Product ID`)

identical(result, test)
# [1] TRUE
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Aggregates or ranks values at the relevant grouping level

    • Builds the intermediate columns that drive the final result

    • Parses the text patterns directly instead of relying on manual cleanup

  • Strengths:

    • The R solution stays close to the workbook rule and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the sheet structure and source ranges remain stable.
  • Gem:

    • The strongest part of the solution is choosing the right intermediate representation before shaping the final output.
import pandas as pd

input = pd.read_excel("CH-046 Numbers Cleaning.xlsx",  usecols="B", skiprows=1, nrows=15)
test = pd.read_excel("CH-046 Numbers Cleaning.xlsx",  usecols="J", skiprows=1, nrows=5)
test.columns = ["Product ID"]

result = input.sort_values("Product ID").reset_index(drop=True) 
result["lag"] = result["Product ID"].shift(1)
result["diff"] = result["Product ID"] - result["lag"]
result["diff"] = result["diff"].fillna(1.0)
result["cumsum"] = result["diff"].gt(1).cumsum()

result2 = result.groupby("cumsum").agg({"Product ID": ["max", "min"]}) 
result2.columns = result2.columns.droplevel(0)
result2["Product ID"] = result2.apply(lambda x: x["min"] if x["min"] == x["max"] else str(x["min"]) + "-" + str(x["max"])[2:4], axis=1)
result2 = result2.drop(columns=["max", "min"]).reset_index(drop=True)

print(result2.equals(test)) # True
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Aggregates or ranks values at the relevant grouping level

  • Strengths:

    • The Python version follows the same rule in a direct dataframe-oriented implementation.
  • Areas for Improvement:

    • The code assumes the workbook layout remains stable, so any sheet redesign would require small adjustments.
  • Gem:

    • The implementation stays close to the original workbook rule instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • The core logic is clear, but the correct transformation pattern is not obvious from the raw input.

  • The challenge combines multiple reshaping, grouping, or parsing steps.