Omid - Challenge 254

data-challenges
advanced-exercises
🔰 Then, divide the resulting parts into two groups from the middle: The first group should populate the initial columns.
Published

March 24, 2026

Illustration for Omid - Challenge 254

Challenge Description

🔰 Then, divide the resulting parts into two groups from the middle: The first group should populate the initial columns.

Solutions

library(tidyverse)
library(readxl)

path = "files/200-299/254/CH-254 Column Splitting.xlsx"
input = read_excel(path, range = "B2:B7")
test  = read_excel(path, range = "D2:I7")

result = input %>%
  mutate(
    hyphen_count = str_count(ID, "-"),
    middle_hyphen = ceiling((hyphen_count + 1) / 2),
    hyphen_loc = str_locate_all(ID, "-")) %>%
  rowwise() %>%
  mutate(middle_hyphen_loc = hyphen_loc[middle_hyphen, 1]) %>%
  ungroup() %>%
  mutate(
    first_part = str_sub(ID, 1, middle_hyphen_loc - 1),
    second_part = str_sub(ID, middle_hyphen_loc + 1, nchar(ID))) %>%
  separate_wider_delim(first_part, delim = "-", too_few = "align_start", names = c("ID.1","ID.2", "ID.3")) %>%
  separate_wider_delim(second_part, delim = "-", too_few = "align_end", names = c("ID.4","ID.5", "ID.6")) %>% 
  select(starts_with("ID."))

all.equal(result, test) 
# > [1] TRUE
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Builds the intermediate columns that drive the final result

    • Parses the text patterns directly instead of relying on manual cleanup

  • Strengths:

    • The R solution stays close to the workbook rule and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the sheet structure and source ranges remain stable.
  • Gem:

    • The strongest part of the solution is choosing the right intermediate representation before shaping the final output.
import pandas as pd

path = "200-299/254/CH-254 Column Splitting.xlsx"
input_df = pd.read_excel(path, usecols="B", skiprows=1, nrows=6)
test = pd.read_excel(path, usecols="D:I", skiprows=1, nrows=6).fillna("")

def split_id(id_):
    parts = id_.split('-')
    n = len(parts)
    mid = (n + 1) // 2
    first = parts[:mid]
    second = parts[mid:]
    first += [''] * (3 - len(first))
    second = [''] * (3 - len(second)) + second
    return first + second

result = input_df.iloc[:,0].apply(split_id)
result = pd.DataFrame(result.tolist(), columns=[f'ID.{i}' for i in range(1,7)])
result = result.astype(str)
test = test.astype(str)

print(result.equals(test)) # True
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Applies the rule iteratively until the output stabilizes

  • Strengths:

    • The Python version follows the same rule in a direct dataframe-oriented implementation.
  • Areas for Improvement:

    • The code assumes the workbook layout remains stable, so any sheet redesign would require small adjustments.
  • Gem:

    • The implementation stays close to the original workbook rule instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • The core logic is clear, but the correct transformation pattern is not obvious from the raw input.

  • The challenge combines multiple reshaping, grouping, or parsing steps.