Excel BI - PowerQuery Challenge 303

excel-challenges
power-query
Extract the corresponding values and pivot them. Also show a total row and a total column. If there are more than one row for an employee, then values are the sum.
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 303

Challenge Description

Extract the corresponding values and pivot them. Also show a total row and a total column. If there are more than one row for an employee, then values are the sum.

Solutions

library(tidyverse)
library(readxl)

path = "Power Query/300-399/303/PQ_Challenge_303.xlsx"
input = read_excel(path, range = "A1:A10")
test  = read_excel(path, range = "C1:H5") %>%
  mutate(across(everything(), ~replace_na(.x, "-")))

result = input %>%
  separate_longer_delim(col = `Travel Data`, delim = ", ") %>%
  separate_wider_delim(col = `Travel Data`, delim = ": ", names = c("key", "value")) %>%
  mutate(emp = cumsum(ifelse(key == "Employee ID", 1, 0))) %>%
  mutate(date = cumsum(ifelse(key == "Date", 1, 0)), .by = emp) %>%
  pivot_wider(names_from = key, values_from = value) %>%
  fill(`Employee ID`, `Employee Name`) %>%
  filter(date != 0) %>%
  summarise(`Employee ID` = first(`Employee ID`),
            `Employee Name` = first(`Employee Name`),
            Hotel = sum(as.numeric(Hotel), na.rm = TRUE), 
            `Per Diem` = sum(as.numeric(`Per Diem`), na.rm = TRUE),
            Transport = sum(as.numeric(Transport), na.rm = TRUE),
            .by = `Employee ID`) %>%
  janitor::adorn_totals(c("row", "col")) 

all.equal(result, test, check.attributes = FALSE)
# > [1] TRUE
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Builds helper columns that drive the final output

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd

path = "300-399/303/PQ_Challenge_303.xlsx"
input = pd.read_excel(path, usecols="A", nrows=10)
test = pd.read_excel(path, usecols="C:H", nrows=4).fillna("").assign(**{'Employee ID': lambda df: df['Employee ID'].astype(str)})
test.columns = test.columns.str.strip()

pairs = (
    input['Travel Data']
    .str.split(', ').explode().str.split(': ', expand=True)
    .rename(columns={0:'key', 1:'val'})
    .assign(emp=lambda d: d.key.eq('Employee ID').cumsum(),
            day=lambda d: d.groupby('emp').key.transform(lambda s: s.eq('Date').cumsum()))
)

wide = (
    pairs.pivot_table(index=['emp','day'], columns='key', values='val', aggfunc='first')
    .reset_index(drop=True)
)

for col in ['Hotel', 'Per Diem', 'Transport']:
    wide[col] = pd.to_numeric(wide[col].replace('', 0).fillna(0))
wide[['Employee ID', 'Employee Name']] = wide[['Employee ID', 'Employee Name']].ffill()

res = (
    wide.groupby('Employee ID', as_index=False)
    .agg({'Employee Name': 'first', 'Hotel': 'sum', 'Per Diem': 'sum', 'Transport': 'sum'})
)
res = res.sort_values(by='Employee ID', ascending=False).reset_index(drop=True)
res['Total'] = res[['Hotel', 'Per Diem', 'Transport']].sum(axis=1).astype(int)
res = pd.concat([res, pd.DataFrame([{
    'Employee ID': 'Total',
    'Employee Name': '',
    'Hotel': res['Hotel'].sum(),
    'Per Diem': res['Per Diem'].sum(),
    'Transport': res['Transport'].sum(),
    'Total': res['Total'].sum()
}])], ignore_index=True)

print(res.equals(test))
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Builds helper columns that drive the final output

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.