Omid - Challenge 6

data-challenges
advanced-exercises
🔰 3- Calculate the distance (absolute defference for value 5 and cluster’s ceter 2, is |5-2|=3) of each values from the center of each clusters 4- Calculate the average of…
Published

March 24, 2026

Illustration for Omid - Challenge 6

Challenge Description

🔰 3- Calculate the distance (absolute defference for value 5 and cluster’s ceter 2, is |5-2|=3) of each values from the center of each clusters 4- Calculate the average of…

Solutions

library(tidyvverse)
library(readxl)
library(FNN)

path = "files/CH-006.xlsx"
input = read_excel(path, range = "B2:B12")

# conduct knn with n = 2 on input values and put cluster number in column C, then n = 3 into D, and n = 4 into E

output = input %>%
  mutate(C = kmeans(input, centers = 2)$cluster,
         D = kmeans(input, centers = 3)$cluster,
         E = kmeans(input, centers = 4)$cluster)
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Builds the intermediate columns that drive the final result

  • Strengths:

    • The R solution stays close to the workbook rule and keeps the transformation compact.
  • Areas for Improvement:

    • The code assumes the sheet structure and source ranges remain stable.
  • Gem:

    • The strongest part of the solution is choosing the right intermediate representation before shaping the final output.
import numpy as np
import pandas as pd

path = "CH-006.xlsx"
input_data = pd.read_excel(path, usecols="B", skiprows=1, nrows=11)

def kmeans_1d(values, k, max_iter=100):
    x = np.asarray(values, dtype=float)
    centers = np.quantile(x, np.linspace(0, 1, k + 2)[1:-1])
    labels = np.zeros(len(x), dtype=int)
    for _ in range(max_iter):
        new_labels = np.argmin(np.abs(x[:, None] - centers[None, :]), axis=1)
        if np.array_equal(new_labels, labels):
            break
        labels = new_labels
        new_centers = centers.copy()
        for i in range(k):
            pts = x[labels == i]
            if len(pts):
                new_centers[i] = pts.mean()
        if np.allclose(new_centers, centers):
            break
        centers = new_centers
    return labels + 1

result = input_data.copy()
result["C"] = kmeans_1d(result.iloc[:, 0], 2)
result["D"] = kmeans_1d(result.iloc[:, 0], 3)
result["E"] = kmeans_1d(result.iloc[:, 0], 4)
print(result)
  • Logic:

    • Reads the workbook ranges needed for the challenge

    • Applies the rule iteratively until the output stabilizes

  • Strengths:

    • The Python version follows the same rule in a direct dataframe-oriented implementation.
  • Areas for Improvement:

    • The code assumes the workbook layout remains stable, so any sheet redesign would require small adjustments.
  • Gem:

    • The implementation stays close to the original workbook rule instead of adding unnecessary abstraction.

Difficulty Level

This task is moderate:

  • The business rule is readable, but the workbook still requires careful implementation to reach the expected layout.