Excel BI - Excel Challenge 936

excel-challenges
excel-formulas
🔰 Extract surname, first name, and ID from mixed free-text strings.
Published

March 23, 2026

Illustration for Excel BI - Excel Challenge 936

Challenge Description

🔰 Extract Surname, First Name and ID.

Solutions

library(tidyverse)
library(readxl)

path <- "900-999/936/936 Extract Name ID.xlsx"
input <- read_excel(path, range = "A2:A11")
test <- read_excel(path, range = "C2:E19")

pattern <- "([A-Z]+)\\s+([A-Z]+)\\s+\\((\\d+)\\)"

result <- input |>
  drop_na() |>
  mutate(matches = str_extract_all(Data, pattern)) |>
  unnest(matches) |>
  mutate(
    Surname = str_match(matches, pattern)[, 2],
    `First Name` = str_match(matches, pattern)[, 3],
    ID = as.integer(str_match(matches, pattern)[, 4])
  ) |>
  select(Surname, `First Name`, ID)

all.equal(result, test)
# [1] TRUE
  • Logic:

    • Define one regex pattern that matches SURNAME FIRSTNAME (ID).

    • Extract all matches from each text row.

    • Expand those matches into separate rows.

    • Parse each match into surname, first name, and ID.

  • Strengths:

    • Pattern-Driven:

      • The solution ignores delimiters and focuses only on valid person records.
    • Handles Multiple Entities per Row:

      • str_extract_all() plus unnest() makes multi-record rows easy to process.
    • Compact Workflow:

      • The full extraction lives in one tidy pipeline.
  • Areas for Improvement:

    • Pattern Assumptions:

      • The regex assumes names are uppercase and single-token on both sides.
  • Gem:

    • Extracting entities directly by pattern is much cleaner than trying to split on inconsistent separators.
import pandas as pd
import re

path = "900-999/936/936 Extract Name ID.xlsx"
input_df = pd.read_excel(path, usecols="A", skiprows=1, nrows=10, header=0)
test = pd.read_excel(path, usecols="C:E", skiprows=1, nrows=18, header=0)

pattern = r'([A-Z]+)\s+([A-Z]+)\s+\((\d+)\)'

records = []
for val in input_df["Data"].dropna():
    for m in re.finditer(pattern, val):
        records.append({"Surname": m.group(1), "First Name": m.group(2), "ID": int(m.group(3))})

result = pd.DataFrame(records).reset_index(drop=True)

print(result.equals(test))
# True
  • Logic:

    • Define a regex with three capture groups.

    • Scan each text row for all person-pattern matches.

    • Extract surname, first name, and ID from each match.

    • Build the final table from the extracted records.

  • Strengths:

    • Explicit Match Handling:

      • re.finditer() makes each extracted record easy to inspect.
    • Robust Against Delimiter Noise:

      • Slashes, dashes, and stars do not matter because the pattern targets the records themselves.
    • Clean Output Construction:

      • The result dataframe is assembled directly from parsed entities.
  • Areas for Improvement:

    • Hard-Coded Name Format:

      • The regex would need extension for more complex names.
  • Gem:

    • Going after the entity pattern instead of the separators is the smart simplification.

Difficulty Level

This task is moderate:

  • Requires regex extraction with multiple entities per row.

  • Involves turning semi-structured text into a normalized table.