Excel BI - Excel Challenge 861

excel-challenges
excel-formulas
🔰 Group Company ID Price A 23:Walmart; 89:Exxon Exxon B 960:Microsoft, 24:IBM, 88 : Oracle Walmart
Published

March 24, 2026

Illustration for Excel BI - Excel Challenge 861

Challenge Description

🔰 Group Company ID Price A 23:Walmart; 89:Exxon Exxon B 960:Microsoft, 24:IBM, 88 : Oracle Walmart

Solutions

library(tidyverse)
library(readxl)

path <- "Excel/800-899/861/861 Transpose.xlsx"
input <- read_excel(path, sheet = 2, range = "A2:B6", col_names = TRUE)
test <- read_excel(path, sheet = 2, range = "D2:G10")

result = input %>%
  mutate(across(everything(), ~ str_replace_all(., " ", ""))) %>%
  mutate(
    Company = str_split(Company, ";|,"),
    Revenue = str_split(Revenue, ",")
  ) %>%
  unnest(c(Company, Revenue)) %>%
  extract(
    Company,
    regex = "^([A-Za-z]+)-(\\d+):\\s*(.+)$",
    into = c("Code", "ID", "Company")
  ) %>%
  mutate(Revenue = as.numeric(Revenue), ID = as.numeric(ID)) %>%
  arrange(Code, Company)

all.equal(result, test)
# [1] TRUE
  • Logic: Read the workbook ranges needed for the challenge; Derive the required intermediate columns; Parse the packed text or string structure.
  • Strengths: The solution stays close to the text pattern itself, which makes the extraction logic easy to audit.
  • Areas for Improvement: The solution assumes the workbook layout and selected ranges remain stable, so any structural change in the sheet would require small adjustments.
  • Gem: The elegant part is how little code is needed once the correct intermediate representation is chosen.
import pandas as pd
import re
import numpy as np

path = "Excel/800-899/861/861 Transpose.xlsx"
input = pd.read_excel(path, sheet_name=1, usecols="A:B", skiprows=1, nrows=4)
test = pd.read_excel(path, sheet_name=1, usecols="D:G", skiprows=1, nrows=8).rename(columns=lambda c: c.replace('.1', ''))
df = input.map(lambda x: re.sub(r" ", "", x) if isinstance(x, str) else x)

df = (
    df.assign(
        Company=df["Company"].str.split(r"[;,]"),
        Revenue=df["Revenue"].apply(lambda x: x.split(",") if isinstance(x, str) and "," in x else [x])
    )
    .explode(["Company", "Revenue"])
)
regex = r"^([A-Za-z]+)-(\d+):\s*(.+)$"
df[["Code", "ID", "Company"]] = (
    df["Company"]
    .str.extract(regex)
)
result = (
    df.assign(
        Revenue=lambda d: pd.to_numeric(d["Revenue"]).astype(np.int64),
        ID=lambda d: pd.to_numeric(d["ID"]).astype(np.int64)
    )
    .sort_values(["Code", "Company"])
    .reset_index(drop=True)
    [["Code", "ID", "Company", "Revenue"]]
)

print(result.equals(test)) # True

The Python version expresses the core extraction rule directly and keeps the pattern matching easy to review.

Difficulty Level

Easy / Medium

The business rule is clear, though the workbook still needs a few transformation steps to reach the expected output.