library(tidyverse)
library(readxl)
path <- "Excel/900-999/918/918 Extract Data.xlsx"
input <- read_excel(path, range = "B1:B1", col_names = FALSE)
test <- read_excel(path, range = "A3:C13")
R <- function(pattern) str_extract_all(input %>% pull(), pattern)[[1]]
result = tibble(
`Ref Number` = R("REF-\\d{4}"),
`E Mail ID` = R("[\\w.-]+@[\\w.-]+"),
`Website Address` = R("(www\\.|https?://)[\\w./-]*[\\w/-]")
)
all(test == result)
## [1] TRUEExcel BI - Excel Challenge 918
excel-challenges
excel-formulas
🔰 918 Extract Data.xlsx says: > Extract REF ID, E Mail Id and Website Addresses from the given data.

Challenge Description
🔰 The prompt in 918 Extract Data.xlsx says: Extract REF ID, E Mail Id and Website Addresses from the given data. The workbook provides one large text block containing mixed content. The goal is to extract three parallel lists:
Solutions
- Logic: Read the source text as one string.; Define one extraction helper that runs a regex and returns all matches.; Use a dedicated regex for reference IDs..
- Strengths: This puzzle works because each entity type has a recognizable signature: - reference numbers begin with
REF-and four digits, - emails contain a local part,@, and a domain, - websites begin withwww.orhttp(s)://. - Areas for Improvement: The approach assumes the workbook structure and naming conventions stay stable, so any changed input shape would need minor adjustments.
- Gem: - reference numbers begin with
REF-and four digits, - emails contain a local part,@, and a domain, - websites begin withwww.orhttp(s)://.
import pandas as pd
import re
path = "Excel/900-999/918/918 Extract Data.xlsx"
input_data = pd.read_excel(path, usecols="B", nrows=1, header=None)
test = pd.read_excel(path, usecols="A:C", skiprows=2, nrows=11)
text = input_data.iloc[0, 0]
def R(pattern):
return re.findall(pattern, text)
result = pd.DataFrame({
"Ref Number": R(r"REF-\d{4}"),
"E Mail ID": R(r"[\w.-]+@[\w.-]+"),
"Website Address": R(r"(?:www\.|https?://)[\w./-]*[\w/-]")
})
print(test.equals(result))
# Visually i see no difference.The Python version follows the same structure: load the full text source.; run re.findall() with one pattern per entity type..
Difficulty Level
Easy
Once the core pattern is recognized, the implementation is short and direct.