Excel BI - Excel Challenge 634

Published

January 20, 2025

Challenge Description

🔰Group every five rows of the question table and then provide some of quantity for each group

🔗 Link to Excel file: 👉https://lnkd.in/dHgBqfH4

Solutions

library(tidyverse)
library(readxl)

path = "Excel/634 Array Equality.xlsx"
input = read_excel(path, range = "A2:B10")
test  = read_excel(path, range = "D2:E7")

result = input %>%
  mutate(
    set_array1 = map(Array1, ~ sort(unique(strsplit(.x, ",")[[1]]))),  # Split, deduplicate, and sort Array1
    set_array2 = map(Array2, ~ sort(unique(strsplit(.x, ",")[[1]])))   # Split, deduplicate, and sort Array2
  ) %>%
  mutate(result = map2(set_array1, set_array2, ~ .x %>% setequal(.y))) # Compare the processed arrays
  filter(result == TRUE) %>%                                           # Filter rows where arrays are equal
  select(Array1, Array2)                                               # Keep only Array1 and Array2 columns

all.equal(result, test, check.attributes = FALSE)
#> [1] TRUE
  • Logic:

    • strsplit: Splits array strings into individual elements using , as the delimiter.

    • unique and sort: Deduplicates and sorts elements for consistency.

    • setequal: Checks if the two arrays contain the same elements, regardless of order.

    • filter: Keeps rows where arrays are equal.

  • Strengths:

    • Clean Transformation: Uses tidyverse functions for concise and readable code.

    • Flexibility: Handles arrays of varying lengths and ensures order doesn’t affect the comparison.

  • Areas for Improvement:

    • Performance: Processing could be optimized for very large datasets with many rows.
  • Gem:

    • The use of setequal ensures robust and order-independent equality checks.
import pandas as pd

path = "634 Array Equality.xlsx"
input = pd.read_excel(path,  usecols="A:B", skiprows=1, nrows=9)
test = pd.read_excel(path,  usecols="D:E", skiprows=1, nrows=5).rename(columns=lambda x: x.split('.')[0])

def split_sort_unique(s):
    return sorted(set(s.split(',')))  # Split, deduplicate, and sort

input['set_array1'] = input['Array1'].apply(split_sort_unique)  # Process Array1
input['set_array2'] = input['Array2'].apply(split_sort_unique)  # Process Array2
input['result'] = input.apply(lambda row: row['set_array1'] == row['set_array2'], axis=1)  # Compare arrays
result = input[input['result'] == True][['Array1', 'Array2']].reset_index(drop=True)  # Filter and reset index

print(result)
print(test)
  • Logic:

    • split_sort_unique: Splits array strings into individual elements, removes duplicates, and sorts.

    • apply: Applies the processing function to both arrays.

    • Row-wise comparison: Checks if the processed arrays are equal.

    • Filtering: Keeps rows where arrays are equal.

  • Strengths:

    • Explicit Logic: Each step is modular and easy to understand.

    • Flexibility: Handles variations in input data robustly.

  • Areas for Improvement:

    • Performance: Row-wise operations (apply) may be slower for very large datasets; vectorization could improve speed.
  • Gem:

    • The use of sorted(set(...)) effectively ensures that order and duplicates don’t affect the comparison.

Difficulty Level

This task is moderate:

  • Requires knowledge of string processing, deduplication, and order-independent comparison.

  • Involves working with potentially variable-length arrays.