Excel BI - PowerQuery Challenge 331

excel-challenges
power-query
Customer Date Product1 Product2 Product3 Units1
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 331

Challenge Description

Customer Date Product1 Product2 Product3 Units1

Solutions

library(tidyverse)
library(readxl)

path = "Power Query/300-399/331/PQ_Challenge_331.xlsx"
input1 = read_excel(path, range = "A1:H10")
input2 = read_excel(path, range = "J1:K8")
test  = read_excel(path, range = "A15:C26")

result = input1 %>%
  pivot_longer(-c(Customer, Date), 
               names_to  = c(".value", "set"),
               names_pattern = "(Product|Units)(\\d+)") %>%
  left_join(input2, by = c("Product" = "Fruits")) %>%
  na.omit() %>%
  summarise(Amount = sum(Units * Price, na.rm = T), .by = c(Customer, Product)) %>%
  arrange(Customer, Product)

all.equal(result, test)
# [1] TRUE
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd

path = "300-399/331/PQ_Challenge_331.xlsx"
input1 = pd.read_excel(path, usecols="A:H", nrows=10)
input2 = pd.read_excel(path, usecols="J:K", nrows=8)
test   = pd.read_excel(path, usecols="A:C", skiprows=14, nrows=12)

input1_long = (
    input1.melt(id_vars=['Customer', 'Date'], var_name='col')
    .assign(set=lambda df: df['col'].str.extract(r'(\d+)')[0],
            type=lambda df: df['col'].str.extract(r'(Product|Units)')[0])
    .pivot(index=['Customer', 'Date', 'set'], columns='type', values='value')
    .reset_index()
)
result = (
    input1_long.merge(input2, left_on='Product', right_on='Fruits')
    .dropna(subset=['Product', 'Units', 'Price'])
    .assign(Units=lambda df: pd.to_numeric(df['Units']),
            Price=lambda df: pd.to_numeric(df['Price']),
            Amount=lambda df: (df['Units'] * df['Price']).astype(int))
    .groupby(['Customer', 'Product'], as_index=False)['Amount'].sum()
    .sort_values(['Customer', 'Product'])
    .reset_index(drop=True)
)

print(result.equals(test))
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Builds helper columns that drive the final output

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.