Excel BI - PowerQuery Challenge 154

excel-challenges
power-query
Calculate the total fly time and rest time for a pilot in hours for all years and months.
Published

March 24, 2026

Illustration for Excel BI - PowerQuery Challenge 154

Challenge Description

Calculate the total fly time and rest time for a pilot in hours for all years and months.

Solutions

library(tidyverse)
library(readxl)

input = read_excel("Power Query/PQ_Challenge_154.xlsx", range = "A1:C10") %>% 
  janitor::clean_names()
test  = read_excel("Power Query/PQ_Challenge_154.xlsx", range = "E1:I23") %>%
  janitor::clean_names()

input$pilot = factor(input$pilot, levels = unique(input$pilot), ordered = TRUE)
test$pilot = factor(test$pilot, levels = unique(test$pilot), ordered = TRUE)

fly = input 
rest = input %>%
  group_by(pilot) %>%
  mutate(prev_end = lag(flight_end, default = NA_POSIXct_)) %>%
  ungroup() %>%
  select(pilot, rest_start = prev_end, rest_end = flight_start) %>%
  na.omit()

get_months = function(start, end) {
  seq = seq(floor_date(start, "month"), ceiling_date(end, "month"), by = "month")
  seq[1] <- start
  seq[length(seq)] <- end
  df = tibble(start = seq[1:(length(seq)-1)], end = seq[2:length(seq)])
  return(df)
}

a = fly %>%
  mutate(df = map2(flight_start, flight_end, get_months)) %>%
  unnest(df) %>%
  select(pilot, start, end) %>%
  mutate(mode = "fly")
b = rest %>%
  mutate(df = map2(rest_start, rest_end, get_months)) %>%
  unnest(df) %>%
  select(pilot, start, end) %>%
  mutate(mode = "rest")

result = bind_rows(a, b) %>%
  mutate(month = month(start),
         year = year(start),
         duration = difftime(end, start, "hours")) %>%
  group_by(pilot, mode, month, year) %>%
  summarise(duration = sum(duration, na.rm = TRUE) %>% as.numeric() %>% round(2)) %>%
  ungroup() %>%
  pivot_wider(names_from = mode, values_from = duration, 
              values_fill = list(duration = 0), names_glue = "{mode}_time") %>%
  left_join(test, by = c("pilot","year", "month"), suffix = c("", "_test")) %>%
  mutate(check_fly = fly_time == fly_time_test,
         check_rest = rest_time == rest_time_test)
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Builds helper columns that drive the final output

  • Strengths:

    • The R solution stays close to the workbook logic and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the workbook layout and selected ranges remain stable.
  • Gem:

    • The best part of the solution is choosing the right intermediate shape before formatting the final output.
import pandas as pd

input_data = pd.read_excel("PQ_Challenge_154.xlsx", usecols="A:C", nrows=10)
input_data.columns = [c.strip().lower() for c in input_data.columns]
test = pd.read_excel("PQ_Challenge_154.xlsx", usecols="E:I", nrows=23)
test.columns = [c.strip().lower() for c in test.columns]

fly = input_data.copy()
rest = input_data.copy()
rest["prev_end"] = rest.groupby("pilot")["flight_end"].shift()
rest = rest.dropna(subset=["prev_end"])[["pilot", "prev_end", "flight_start"]].rename(columns={"prev_end": "rest_start", "flight_start": "rest_end"})

def split_months(start, end):
    points = [start]
    current = pd.Timestamp(start).replace(day=1) + pd.offsets.MonthBegin(1)
    while current < end:
        points.append(current)
        current = current + pd.offsets.MonthBegin(1)
    points.append(end)
    return pd.DataFrame({"start": points[:-1], "end": points[1:]})

fly_parts = []
for _, row in fly.iterrows():
    part = split_months(row["flight_start"], row["flight_end"])
    part["pilot"] = row["pilot"]
    part["mode"] = "fly"
    fly_parts.append(part)
rest_parts = []
for _, row in rest.iterrows():
    part = split_months(row["rest_start"], row["rest_end"])
    part["pilot"] = row["pilot"]
    part["mode"] = "rest"
    rest_parts.append(part)

result = pd.concat(fly_parts + rest_parts, ignore_index=True)
result["month"] = pd.to_datetime(result["start"]).dt.month
result["year"] = pd.to_datetime(result["start"]).dt.year
result["duration"] = (pd.to_datetime(result["end"]) - pd.to_datetime(result["start"])).dt.total_seconds() / 3600
result = result.groupby(["pilot", "mode", "month", "year"], as_index=False)["duration"].sum()
result["duration"] = result["duration"].round(2)
result = result.pivot(index=["pilot", "month", "year"], columns="mode", values="duration").fillna(0).reset_index()
result.columns.name = None
result = result.rename(columns={"fly": "fly_time", "rest": "rest_time"})
result = result.merge(test, on=["pilot", "year", "month"], how="left", suffixes=("", "_test"))
result["check_fly"] = result["fly_time"] == result["fly_time_test"]
result["check_rest"] = result["rest_time"] == result["rest_time_test"]

print(result.equals(test.assign(check_fly=True, check_rest=True)))
  • Logic:

    • Reads the workbook range needed for the challenge

    • Reshapes the data into the structure required by the result table

    • Aggregates or ranks values at the relevant grouping level

    • Builds helper columns that drive the final output

  • Strengths:

    • The Python version follows the same workbook rule in a direct pandas-oriented implementation.
  • Areas for Improvement:

    • As with the R version, any workbook layout change would require small adjustments.
  • Gem:

    • The implementation stays close to the source challenge instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • It combines reshaping, grouping, or parsing steps that are common in Power Query style problems.

  • The main challenge is reproducing the workbook output structure exactly.