library(tidyverse)
library(readxl)
library(lubridate)
path <- "PowerQuery/300-399/348/PQ_Challenge_348.xlsx"
input <- read_excel(path, range = "A1:A96", col_names = FALSE)
test <- read_excel(path, range = "C1:H46")
result = input %>%
separate_wider_delim(`...1`, delim = ",", names_sep = "_") %>%
janitor::row_to_names(1) %>%
mutate(Timestamp = ymd_hms(Timestamp)) %>%
group_by(UserID) %>%
mutate(
SessionID = cumsum(
Timestamp - lag(Timestamp, default = first(Timestamp)) > 30 * 60
) +
1
) %>%
ungroup() %>%
group_by(UserID, SessionID) %>%
summarise(
StartTime = min(Timestamp),
EndTime = max(Timestamp),
PageCount = n(),
Duration = as.numeric(difftime(
max(Timestamp),
min(Timestamp),
units = "mins"
)),
.groups = "drop"
)
all.equal(result, test)
#TRUEExcel BI - PowerQuery Challenge 348
excel-challenges
power-query
UserID,Timestamp,Page UserID SessionID StartTime EndTime PageCount

Challenge Description
UserID,Timestamp,Page UserID SessionID StartTime EndTime PageCount
Solutions
Logic:
Reads the workbook range needed for the challenge
Aggregates or ranks values at the relevant grouping level
Builds helper columns that drive the final output
Strengths:
- The R solution stays close to the workbook logic and keeps the transformation compact.
Areas for Improvement:
- The code assumes the workbook layout and selected ranges remain stable.
Gem:
- The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd
excel_path = "Power Query/300-399/348/PQ_Challenge_348.xlsx"
input = pd.read_excel(excel_path, header=None, usecols="A", nrows=96)
test = pd.read_excel(excel_path, usecols="C:H", nrows=45)
df = input.iloc[:, 0].str.split(",", expand=True)
df.columns = df.iloc[0]
df = df.iloc[1:].reset_index(drop=True)
df["Timestamp"] = pd.to_datetime(
df["Timestamp"].str.strip().str.replace("T", " ").str.replace("Z", "").str[:19],
format="%Y-%m-%d %H:%M:%S",
)
df = df.sort_values(["UserID", "Timestamp"]).reset_index(drop=True)
df["delta_secs"] = df.groupby("UserID")["Timestamp"].diff().dt.total_seconds().fillna(0)
df["SessionID"] = (df["delta_secs"] > 1800).groupby(df["UserID"]).cumsum() + 1
result = (
df.groupby(["UserID", "SessionID"], as_index=False)
.agg(
StartTime=("Timestamp", "min"),
EndTime=("Timestamp", "max"),
PageCount=("Timestamp", "count"),
Duration=("Timestamp", lambda x: (x.max() - x.min()).total_seconds() / 60),
)
.sort_values(["UserID", "SessionID"])
.reset_index(drop=True)
)
result["Duration"] = result["Duration"].astype("int64")
test["Duration"] = test["Duration"].astype("int64")
print(result.equals(test))Logic:
Reads the workbook range needed for the challenge
Aggregates or ranks values at the relevant grouping level
Strengths:
- The Python version follows the same workbook rule in a direct pandas-oriented implementation.
Areas for Improvement:
- As with the R version, any workbook layout change would require small adjustments.
Gem:
- The implementation stays close to the source challenge instead of adding unnecessary abstraction.
Difficulty Level
This task is moderate:
It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.
The main challenge is reproducing the workbook output structure exactly.