Excel BI - PowerQuery Challenge 226

excel-challenges
power-query
Dept ID Highest Paid Employee Promoted Employees Not Promoted Employees Emp Names Promotion Date
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 226

Challenge Description

Dept ID Highest Paid Employee Promoted Employees Not Promoted Employees Emp Names Promotion Date

Solutions

library(tidyverse)
library(readxl)

path = "Power Query/PQ_Challenge_226.xlsx"
input = read_excel(path, range = "A1:D13")
test  = read_excel(path, range = "F1:I19")

result = input %>%
  fill(`Dept ID`) %>%
  select(-`Highest Paid Employee`) %>%
  pivot_longer(-`Dept ID`, values_to = "Value") %>%
  separate(Value, into = c("Emp Names", "Salary", "Promotion Date"), sep = "-") %>%
  select(-name) %>%
  filter(!is.na(`Emp Names`)) %>%
  arrange(`Dept ID`, `Emp Names`) %>%
  mutate(`Promotion Date` = as.POSIXct(`Promotion Date`, format = "%m/%d/%Y", tz = "UTC"),
         Salary = as.numeric(Salary)) %>%
  select(`Dept ID`, `Emp Names`, `Promotion Date`, Salary)

all.equal(result, test, check.attributes = FALSE)
#> [1] TRUE
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Builds helper columns that drive the final output

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd

path = "PQ_Challenge_226.xlsx"
input = pd.read_excel(path, usecols="A:D", nrows=13)
test = pd.read_excel(path, usecols="F:I", nrows=19).rename(columns=lambda x: x.replace('.1', ''))

input['Dept ID'] = input['Dept ID'].ffill().astype('int64')
input = input.drop(columns=['Highest Paid Employee']).melt(id_vars=['Dept ID'], var_name='name', value_name='Value')
input[['Emp Names', 'Salary', 'Promotion Date']] = input['Value'].str.split('-', expand=True)
input = input.drop(columns=['name', 'Value']).dropna(subset=['Emp Names']).sort_values(by=['Dept ID', 'Emp Names'])
input['Promotion Date'] = pd.to_datetime(input['Promotion Date'], format="%m/%d/%Y")
input['Salary'] = pd.to_numeric(input['Salary']).astype('int64')

result = input[['Dept ID', 'Emp Names', 'Promotion Date', 'Salary']].reset_index(drop=True)

print(result.equals(test)) # True
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.