library(tidyverse)
library(readxl)
= "Power Query/PQ_Challenge_254.xlsx"
path = read_excel(path, range = "A1:Q5")
input = read_excel(path, range = "A9:C19")
test
= input %>%
result pivot_longer(
cols = -Dept,
names_to = c(".value", "person"),
names_pattern = "(.*)(\\d+)"
%>%
) na.omit() %>%
select(-person) %>%
unite("Age & Nationality & Salary", Age, Nationality, Salary, sep = ", ")
all.equal(result, test, check.attributes = FALSE)
#> [1] TRUE
Excel BI - PowerQuery Challenge 254
Challenge Description
🔰Group every five rows of the question table and then provide some of quantity for each group
lnkd.in 🔗 Link to Excel file: 👉https://lnkd.in/gvWMZVcm>
Solutions
Logic:
pivot_longer
: Converts wide data to long format by separating column names into base names and numeric identifiers.unite
: Concatenates selected columns (Age
,Nationality
,Salary
) into a single string column.na.omit
: Removes rows with missing values.
Strengths:
Compact Transformation: The use of
pivot_longer
andunite
simplifies reshaping and formatting.Readability: Tidyverse functions make the process easy to follow.
Areas for Improvement:
- Dynamic Column Handling: Ensure the solution dynamically adapts to column name variations or additional fields.
Gem:
- The regex
(.*)(\\d+)
effectively extracts base column names and their associated numbers.
- The regex
import pandas as pd
= "PQ_Challenge_254.xlsx"
path input = pd.read_excel(path, usecols="A:Q", nrows=5)
= pd.read_excel(path, usecols="A:C", skiprows=8, nrows=11).sort_values("Dept").reset_index(drop=True)
test
= pd.melt(input, id_vars=[input.columns[0]], var_name='Variable', value_name='Value')
input_long 'Name', 'Number']] = input_long['Variable'].str.extract(r'([a-zA-Z]+)(\d+)')
input_long[[=['Variable'], inplace=True)
input_long.drop(columns=['Value'], inplace=True)
input_long.dropna(subset'Name'].isin(['Salary', 'Age']), 'Value'] = input_long.loc[input_long['Name'].isin(['Salary', 'Age']), 'Value'].astype(int)
input_long.loc[input_long[= input_long.pivot_table(index=['Dept', 'Number'], columns='Name', values='Value', aggfunc='first').reset_index()
input_pivot 'Age & Nationality & Salary'] = input_pivot[['Age', 'Nationality', 'Salary']].astype(str).agg(', '.join, axis=1)
input_pivot[=['Number', 'Age', 'Nationality', 'Salary'], inplace=True)
input_pivot.drop(columns= input_pivot.rename_axis(None, axis=1)
input_pivot
print(input_pivot.equals(test)) # True
Logic:
pd.melt
: Converts wide data to long format for easier manipulation.str.extract
: Splits column names into base names (Name
) and numeric identifiers (Number
).pivot_table
: Reshapes the data into a grouped format.Column concatenation: Combines multiple fields into a single formatted column.
Strengths:
Modularity: Each transformation step is clearly defined and reusable.
Flexibility: Handles data aggregation and formatting dynamically.
Areas for Improvement:
- Error Handling: Ensure robust handling of unexpected data types or missing columns.
Gem:
- The use of
str.extract
for splitting column names based on a regex is concise and adaptable.
- The use of
Difficulty Level
This task is moderate:
Requires reshaping and aggregating data, both of which are common but non-trivial transformations.
Demands familiarity with regex for parsing column names.