library(tidyverse)
library(readxl)
path = "Excel/637 Insert Dash At Non Consecutive Character.xlsx"
input = read_excel(path, range = "A1:A8")
test = read_excel(path, range = "B1:B8")
process_string = function(string) {
string %>%
str_split("") %>% # Split string into characters
unlist() %>% # Flatten the list
tibble(char = .) %>% # Create a tibble
mutate(value = ifelse(is.na(as.numeric(char)),
match(char, LETTERS), # Convert letters to positions (A=1, B=2, ...)
as.numeric(char))) %>% # Keep numeric values as-is
mutate(dash = ifelse(value - lag(value) != 1, "-", "")) %>% # Check for non-consecutive
replace_na(list(dash = "")) %>% # Replace NA dash values with ""
unite("char", c("dash", "char"), sep = "") %>% # Combine dash with character
pull(char) %>% # Extract character vector
paste0(collapse = "") # Reconstruct the processed string
}
result = input %>%
mutate(processed = map_chr(String, process_string))
print(result$processed == test$`Answer Expected`)Excel BI - Excel Challenge 637

Challenge Description
🔰 Given alphanumeric strings, insert a dash if two characters are not consecutive in increasing order.
Ex. ABY => AB-Y (A & B are consecutive in increasing order and B & Y are not consecutive, hence dash after AB). BA2R => B-A-2-R (B & A are consecutive but not in increasing order. A and 2 are not consecutive. 2 and R are not consecutive)
🔗 Link to Excel file: 👉https://lnkd.in/dZt6XuE3
Solutions
Logic:
Split Characters: The string is split into individual characters using
str_split.- Map to Values:
Alphabetic characters are mapped to their positions in the English alphabet using
match(char, LETTERS).Numeric characters are converted to their numeric values using
as.numeric.
Identify Non-Consecutive Pairs:
The difference between the numeric values of consecutive characters is calculated using
lag.A dash is inserted if the difference is not
1.
Reconstruct the String:
- Characters and dashes are combined using
uniteand reconstructed into the final string.
- Characters and dashes are combined using
Strength:
Flexibility:
Handles both alphabetic and numeric characters dynamically, ensuring wide applicability.
Integration with
tidyverse:- The use of
mutate,map_chr, anduniteensures clean, modular, and readable code.
- The use of
Dynamic Logic:
- Automatically processes strings of varying lengths and structures without manual intervention.
Area for Improvement:
Handling of Edge Cases:
The original logic failed for cases like
ABAB, where the same pair (AB) is processed multiple times, leading to redundant or missed dashes.Efficiency:
For very long strings, the multiple
mutatesteps could be computationally expensive.Complexity:
- While readable, the pipeline could be simplified or modularized into smaller functions for better maintainability.
Gem:
Numeric Mapping for Characters:
- The use of
match(char, LETTERS)to map alphabetic characters to their numeric positions is elegant and avoids manual encoding or additional libraries.
- The use of
import pandas as pd
import numpy as np
path = "637 Insert Dash At Non Consecutive Character.xlsx"
input = pd.read_excel(path, usecols="A", nrows=8)
test = pd.read_excel(path, usecols="B", nrows=8)
def process_string(string):
result = [string[0]]
for i in range(1, len(string)):
if (ord(string[i]) - ord(string[i-1]) != 1):
result.append('-')
result.append(string[i])
return ''.join(result)
input['processed'] = input.iloc[:, 0].apply(process_string)
print(test['Answer Expected'] == input['processed'])
# 0 True
# 1 True
# 2 True
# 3 True
# 4 True
# 5 True
# 6 False AB in this string can be pair once. Logic:
Split Characters: The string is iterated character by character.
Compare Consecutive Characters:
The difference between ASCII values (
ord) of consecutive characters is calculated.A dash (
-) is inserted if the difference is not1.
Reconstruct the String:
- Characters and dashes are appended iteratively to build the final string.
Strength:
Explicit Logic:
- The use of
ordmakes the comparison between characters straightforward and intuitive.
- The use of
Iterative Approach:
- Processes each character pair exactly once, making the logic simple to follow and efficient.
Flexibility:
- Handles mixed alphanumeric strings of varying lengths seamlessly.
Area for Improvement:
Edge Case Handling:
- The original implementation failed for cases like
ABAB, where the same pair (AB) is redundantly processed, leading to incorrect results.
- The original implementation failed for cases like
Code Modularity:
- The core logic could be broken into reusable helper functions for splitting, comparison, and reconstruction.
Gem:
Efficient ASCII Comparison:
- The use of
ordfor character comparison is concise and ensures seamless handling of both letters and numbers without additional logic.
- The use of
Difficulty Level
This task is moderate:
Requires grouping and aggregation across multiple dimensions.
Involves filtering and transforming data dynamically based on conditions.