Excel BI - PowerQuery Challenge 224

excel-challenges
power-query
Column1 Column2 Column3 Column4 Date Name
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 224

Challenge Description

Column1 Column2 Column3 Column4 Date Name

Solutions

library(tidyverse)
library(readxl)
library(janitor)

path = "Power Query/PQ_Challenge_224.xlsx"
input = read_excel(path, range = "A1:D12")
test  = read_excel(path, range = "F1:I20")

result = input %>%
  mutate(date = ifelse(str_detect(Column1, "\\d"), Column1, NA)) %>%
  fill(date) %>%
  set_names(.[1, ]) %>%
  rename("Name" = 1, "date" = 5) %>%
  filter(!str_detect(Name, "\\d")) %>%
  mutate(date = coalesce(excel_numeric_to_date(as.numeric(date)), mdy(date))) %>%
  pivot_longer(-c(date, Name), names_to = "Data", values_to = "Value") %>%
  na.omit() %>%
  select(Date = date, Name, Data, Value) %>%
  mutate(Value = as.numeric(Value), 
         Date = as.POSIXct(Date))

all.equal(result, test, check.attributes = FALSE)
#> [1] TRUE
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Builds helper columns that drive the final output

    • Uses direct pattern parsing where the workbook encodes logic in text

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd
import numpy as np
from datetime import datetime

path = "PQ_Challenge_224.xlsx"
input = pd.read_excel(path, usecols="A:D", nrows=11)
test = pd.read_excel(path, usecols="F:I", nrows=20)

input['date'] = np.where(input['Column1'].str.contains(r'\d'), input['Column1'], np.nan)
input['date'] = input['date'].ffill()
input.columns = ['Name', 'Data1', 'Data2', 'Data3', "Date"]

input['has_letter'] = input['Data1'].str.contains(r'[a-zA-Z]', na=False)
input = input[~input['has_letter']]

input['Date'] = pd.to_datetime(input['Date'], format='%m/%d/%Y', errors='coerce')
input.loc[:, 'Data1':'Data3'] = input.loc[:, 'Data1':'Data3'].apply(pd.to_numeric, errors='coerce')
input = input.drop(columns='has_letter')

result = (input.melt(id_vars=['Date', 'Name'], var_name='Data', value_name='Value')
               .dropna()
               .sort_values(by=['Date', 'Name', 'Data'])
               .reset_index(drop=True))
result['Value'] = result['Value'].astype('int64')

test = test.sort_values(['Date', 'Name', 'Data']).reset_index(drop=True)

print(result.equals(test)) # True
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.