Excel BI - Excel Challenge 878

excel-challenges
excel-formulas
🔰 Extract REF IDs, E Mail IDs and Website Addresses from the given data.
Published

March 24, 2026

Illustration for Excel BI - Excel Challenge 878

Challenge Description

🔰 Extract REF IDs, E Mail IDs and Website Addresses from the given data.

Solutions

library(tidyverse)
library(readxl)
library(rebus)

path <- "Excel/800-899/878/878 Complex Regex Extraction.xlsx"
input <- read_excel(path, range = "A2:A40")
test <- read_excel(path, range = "C2:E40")

ref_pattern = "(?<=REF\\-)\\d{4}"
email_pattern = "[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,}"
web_pattern = "(?:https?:\\/\\/|www\\.)[A-Za-z0-9-]+(?:\\.[A-Za-z0-9-]+)*\\.[A-Za-z]{2,}"

result = input %>%
  mutate(
    REF = str_extract(`Log Data`, ref_pattern) %>% as.numeric(),
    `Mail ID` = str_extract(`Log Data`, email_pattern),
    `Web Address` = str_extract(`Log Data`, web_pattern)
  ) %>%
  select(-`Log Data`)


all.equal(result, test, check.attributes = FALSE)
  • Logic: Read the workbook ranges needed for the challenge; Derive the required intermediate columns; Parse the packed text or string structure.
  • Strengths: The solution stays close to the text pattern itself, which makes the extraction logic easy to audit.
  • Areas for Improvement: The solution assumes the workbook layout and selected ranges remain stable, so any structural change in the sheet would require small adjustments.
  • Gem: A small number of well-targeted text patterns does most of the heavy lifting.
import pandas as pd

path = "Excel/800-899/878/878 Complex Regex Extraction.xlsx"
input = pd.read_excel(path, usecols="A", skiprows=1, nrows=39)
test = pd.read_excel(path, usecols="C:E", skiprows=1, nrows=39)

ref_pattern = r"(?<=REF-)(\d{4})"
email_pattern = r"([A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,})"
web_pattern = r"((?:https?:\/\/|www\.)[A-Za-z0-9-]+(?:\.[A-Za-z0-9-]+)*\.[A-Za-z]{2,})"

result = pd.DataFrame()
result['REF'] = input['Log Data'].str.extract(ref_pattern)[0].astype("int64")
result['Mail ID'] = input['Log Data'].str.extract(email_pattern)[0]
result['Web Address'] = input['Log Data'].str.extract(web_pattern)[0]

print(result.equals(test))
# True

The Python version expresses the core extraction rule directly and keeps the pattern matching easy to review.

Difficulty Level

Medium

The individual steps are manageable, but the correct transformation pattern is not obvious from the raw data.