Excel BI - PowerQuery Challenge 195

excel-challenges
power-query
Items Unit Price Quantity Stockist Amount Paid A, B
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 195

Challenge Description

Items Unit Price Quantity Stockist Amount Paid A, B

Solutions

library(tidyverse)
library(readxl)

path = "Power Query/PQ_Challenge_195.xlsx"
input1 = read_xlsx(path, range = "A1:C5")
input2 = read_xlsx(path, range = "A8:B11")                
test = read_xlsx(path, range = "F1:G4")

result1 = input1 %>%
  mutate(across(everything(), ~str_split(.x, "\\W+"))) %>%
  unnest(cols = everything()) %>%
  mutate(total = as.numeric(`Unit Price`) * as.numeric(Quantity)) %>%
  select(Items, total)

result2 = input2 %>%
  mutate(across(everything(), ~str_split(.x, "\\W+"))) %>%
  unnest(cols = everything()) %>%
  mutate(part = n(), .by = Items)

result = result2 %>%
  left_join(result1, by = "Items") %>%
  mutate(paid_by_stockist = total/part) %>%
  summarise(`Amount Paid` = sum(paid_by_stockist, na.rm = T), .by = Stockist)

identical(result, test)
# [1] TRUE
  • Logic:

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Builds helper columns that drive the final output

    • Uses direct pattern parsing where the workbook encodes logic in text

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd
import re

def split_string(s):
    return re.split(r"\W+", str(s))

path = "PQ_Challenge_195.xlsx"
input1 = pd.read_excel(path, usecols="A:C", nrows=5)
input2 = pd.read_excel(path, usecols="A:B", skiprows=7, nrows=3)
test = pd.read_excel(path, usecols="F:G", nrows=3)

input1 = pd.concat([input1[col].apply(split_string).explode().reset_index(drop=True) for col in input1.columns], axis=1)

input2["Items"] = input2["Items"].apply(split_string)
input2 = pd.DataFrame(input2["Items"].explode()).\
    merge(input2, left_index=True, right_index=True).\
    drop(columns=["Items_y"]).rename(columns={"Items_x": "Items"})
input2["Stockist_no"] = input2.groupby("Items")["Stockist"].transform("count")
input2 = pd.merge(input1, input2, on="Items", how="right").dropna().reset_index(drop=True)
input2["Amount Paid"] = (input2["Unit Price"].astype(int) * input2["Quantity"].astype(int)) / input2["Stockist_no"]
input2["Amount Paid"] = input2["Amount Paid"].astype("int64")

result = input2.groupby("Stockist")["Amount Paid"].sum().reset_index()

print(result.equals(test)) # True
  • Logic:

    • Reads the workbook range needed for the challenge

    • Aggregates or ranks values at the relevant grouping level

    • Uses direct pattern parsing where the workbook encodes logic in text

    • Applies the rule iteratively until the output is complete

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.