Excel BI - Excel Challenge 936

excel-challenges

excel-formulas

🔰 Extract surname, first name, and ID from mixed free-text strings.

Published

March 23, 2026

Illustration for Excel BI - Excel Challenge 936

Challenge Description

🔰 Extract Surname, First Name and ID.

Solutions

library(tidyverse)
library(readxl)

path <- "900-999/936/936 Extract Name ID.xlsx"
input <- read_excel(path, range = "A2:A11")
test <- read_excel(path, range = "C2:E19")

pattern <- "([A-Z]+)\\s+([A-Z]+)\\s+\\((\\d+)\\)"

result <- input |>
  drop_na() |>
  mutate(matches = str_extract_all(Data, pattern)) |>
  unnest(matches) |>
  mutate(
    Surname = str_match(matches, pattern)[, 2],
    `First Name` = str_match(matches, pattern)[, 3],
    ID = as.integer(str_match(matches, pattern)[, 4])
  ) |>
  select(Surname, `First Name`, ID)

all.equal(result, test)
# [1] TRUE

Logic:
- Define one regex pattern that matches SURNAME FIRSTNAME (ID).
- Extract all matches from each text row.
- Expand those matches into separate rows.
- Parse each match into surname, first name, and ID.
Strengths:
- Pattern-Driven:
  - The solution ignores delimiters and focuses only on valid person records.
- Handles Multiple Entities per Row:
  - str_extract_all() plus unnest() makes multi-record rows easy to process.
- Compact Workflow:
  - The full extraction lives in one tidy pipeline.
Areas for Improvement:
- Pattern Assumptions:
  - The regex assumes names are uppercase and single-token on both sides.
Gem:
- Extracting entities directly by pattern is much cleaner than trying to split on inconsistent separators.

import pandas as pd
import re

path = "900-999/936/936 Extract Name ID.xlsx"
input_df = pd.read_excel(path, usecols="A", skiprows=1, nrows=10, header=0)
test = pd.read_excel(path, usecols="C:E", skiprows=1, nrows=18, header=0)

pattern = r'([A-Z]+)\s+([A-Z]+)\s+\((\d+)\)'

records = []
for val in input_df["Data"].dropna():
    for m in re.finditer(pattern, val):
        records.append({"Surname": m.group(1), "First Name": m.group(2), "ID": int(m.group(3))})

result = pd.DataFrame(records).reset_index(drop=True)

print(result.equals(test))
# True

Logic:
- Define a regex with three capture groups.
- Scan each text row for all person-pattern matches.
- Extract surname, first name, and ID from each match.
- Build the final table from the extracted records.
Strengths:
- Explicit Match Handling:
  - re.finditer() makes each extracted record easy to inspect.
- Robust Against Delimiter Noise:
  - Slashes, dashes, and stars do not matter because the pattern targets the records themselves.
- Clean Output Construction:
  - The result dataframe is assembled directly from parsed entities.
Areas for Improvement:
- Hard-Coded Name Format:
  - The regex would need extension for more complex names.
Gem:
- Going after the entity pattern instead of the separators is the smart simplification.

Difficulty Level

This task is moderate:

Requires regex extraction with multiple entities per row.
Involves turning semi-structured text into a normalized table.

Challenge Description

Solutions

Difficulty Level

Related Challenges