library(tidyverse)
library(readxl)
input1 = read_excel("files/CH-058 Stepped Tax.xlsx", range = "B2:D7")
input2 = read_excel("files/CH-058 Stepped Tax.xlsx", range = "F2:G7")
test = read_excel("files/CH-058 Stepped Tax.xlsx", range = "H2:H7")
input1$To = ifelse(input1$To == "Over", Inf, input1$To) %>% as.numeric()
result = input1 %>%
mutate(key = 1) %>%
full_join(input2 %>% mutate(key = 1), by = "key") %>%
select(-key) %>%
filter(From <= To) %>%
mutate(income_over_threshold = Income - From,
income_in_threshold = ifelse(Income >= From & Income <= To , T, F)) %>%
filter(income_over_threshold >= 0) %>%
arrange(`Person ID`) %>%
mutate(tax = ifelse(income_in_threshold,
income_over_threshold * `Tax Rate`,
(To - From) * `Tax Rate`)) %>%
summarise(Tax = sum(tax), .by = c(`Person ID`, Income)) %>%
select(Tax)
all(round(result$Tax, 1) == round(test$Tax, 1))
# TRUEOmid - Challenge 58
data-challenges
advanced-exercises
🔰 In this challenge, we aim to find an efficient way to calculate the stepped tax based on the tax rates presented in the question table.

Challenge Description
🔰 In this challenge, we aim to find an efficient way to calculate the stepped tax based on the tax rates presented in the question table.
Solutions
Logic:
Reads the workbook ranges needed for the challenge
Aggregates or ranks values at the relevant grouping level
Builds the intermediate columns that drive the final result
Strengths:
- The R solution stays close to the workbook rule and keeps the transformation compact.
Areas for Improvement:
- The code assumes the sheet structure and source ranges remain stable.
Gem:
- The strongest part of the solution is choosing the right intermediate representation before shaping the final output.
import pandas as pd
import numpy as np
input1 = pd.read_excel("CH-058 Stepped Tax.xlsx", usecols="B:D", skiprows=1, nrows = 6)
input2 = pd.read_excel("CH-058 Stepped Tax.xlsx", usecols="F:G", skiprows=1, nrows = 6)
test = pd.read_excel("CH-058 Stepped Tax.xlsx", usecols="H", skiprows=1, nrows = 6)
input1.loc[4, 'To'] = float('inf')
input1['key'] = 1
input2['key'] = 1
output = pd.merge(input1, input2, on='key')
output['income_over_threshold'] = output["Income"] - output["From"]
output['income_in_threshold'] = np.where((output["Income"] >= output["From"]) & (output["Income"] <= output["To"]), True, False)
output = output[output['income_over_threshold'] > 0].sort_values(by = ["Person ID"]).reset_index(drop = True)
output['tax'] = np.where(output['income_in_threshold'],
output['income_over_threshold'] * output['Tax Rate'],
(output['To'] - output['From']) * output['Tax Rate'])
output = output.groupby('Person ID').agg({'tax': 'sum'}).astype("float64").reset_index()
output['tax'] = output['tax'].round(2)
test['tax'] = test['Tax'].round(2)
print(all(output['tax'] == test['tax'])) # TrueLogic:
Reads the workbook ranges needed for the challenge
Aggregates or ranks values at the relevant grouping level
Strengths:
- The Python version follows the same rule in a direct dataframe-oriented implementation.
Areas for Improvement:
- The code assumes the workbook layout remains stable, so any sheet redesign would require small adjustments.
Gem:
- The implementation stays close to the original workbook rule instead of adding unnecessary abstraction.
Difficulty Level
This task is moderate:
- The business rule is readable, but the workbook still requires careful implementation to reach the expected layout.