Excel BI - PowerQuery Challenge 204

excel-challenges
power-query
Col1 Match Col2 Match Col3 Match Col4 Match
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 204

Challenge Description

Col1 Match Col2 Match Col3 Match Col4 Match

Solutions

library(tidyverse)
library(readxl)

path = "Power Query/PQ_Challenge_204.xlsx"
input = read_excel(path, range = "A1:D7")
test = read_excel(path, range = "F1:I4")

count_intersections <- function(col_name, df) {
  col = df[[col_name]] %>% na.omit()
  other_cols = df %>% select(-all_of(col_name)) %>% map(na.omit)
  
  intersection_counts = other_cols %>%
    map_int(~ length(intersect(col, .x)))
  
  filtered_counts = intersection_counts[intersection_counts > 0]
  filtered_names = names(filtered_counts)
  
  map2_chr(filtered_names, filtered_counts, ~ paste(.x, "-", .y)) %>%
    paste(collapse = ", ")
}

result = map_chr(names(input), ~ count_intersections(.x, input))

result1 = tibble(
  Column = paste(names(input), "Match"),
  Intersections = result
) %>%
  separate_rows(Intersections, sep = ", ") %>%
  mutate(nr = row_number(), .by = Column) %>%
  pivot_wider(names_from = Column, values_from = Intersections) %>%
  select(-nr)

identical(result1, test)
# [1] TRUE
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Builds helper columns that drive the final output

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd

path = "PQ_Challenge_204.xlsx"
input = pd.read_excel(path, usecols="A:D")
test = pd.read_excel(path, usecols="F:I", nrows=3)

def count_intersections(col_name, df):
    col = df[col_name].dropna()
    other_cols = df.drop(col_name, axis=1).apply(lambda x: x.dropna())
    intersection_counts = other_cols.apply(lambda x: len(set(col) & set(x)))
    filtered_counts = intersection_counts[intersection_counts > 0]
    filtered_names = filtered_counts.index
    result = [f"{name} - {count}" for name, count in zip(filtered_names, filtered_counts)]
    return ", ".join(result)

result = [count_intersections(col, input) for col in input.columns]
result1 = pd.DataFrame({
    "Column": [f"{col} Match" for col in input.columns],
    "Intersections": result
})
result1["Intersections"] = result1["Intersections"].str.split(", ")
result1 = result1.explode("Intersections")
result1["nr"] = result1.groupby("Column").cumcount() + 1
result1 = result1.pivot(index="nr", columns="Column", values="Intersections").reset_index(drop=True)
result1.columns.name = None

print(result1.equals(test)) # True
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Applies the rule iteratively until the output is complete

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.