← all thoughts

Building in Public

log entry No. 03

Night City Voices

From MERN to Edge ML: Building a Hybrid Tone Classifier for Night City Voices

“You get to a certain age, you drop all your illusions. Life just gets easier from there.” — Viktor Vektor

Is that hopeful? Cynical? It sounds like wisdom. It lands like defeat.

That’s the problem. And it’s harder to solve than it looks.

Night City Voices started as a #100DaysOfCode experiment: a React frontend hitting an Express REST API, backed by MongoDB, serving random Cyberpunk 2077 quotes. Somewhere along the way, classifying the tone of those quotes became the real challenge. And solving it led to something I didn’t plan: a hybrid, edge-inferred ML system with no Python server in the production path.

The Real Problem: Tone ≠ Sentiment

Cyberpunk dialogue doesn’t behave like normal sentiment datasets. A character can say something brutal that reads hopeful. Something optimistic that lands cynical. Something calm that feels dark. Binary positive/negative models collapse immediately. The goal shifted:

Given any quote from Cyberpunk 2077, classify it as DARK, HOPEFUL, or CYNICAL; accurately enough to be useful, and honestly enough to surface ambiguity.

That’s when the architecture had to change.

The Evolution

Phase 1 — Rule-Based Heuristics

The first classifier was pure JavaScript: three word lists (DARK_WORDS, HOPEFUL_WORDS, CYNICAL_WORDS), count matches, pick the strongest signal.

It worked — until sarcasm entered the room.

Quotes with hopeful vocabulary but dark structure? Misclassified. Quotes with contrast (“dreams cost eddies”)? Broken.

This was 44% accuracy territory. Cynical recall was worse.

Fine for a frontend gimmick. Not fine for a portfolio piece.

Phase 2 — TF-IDF + Logistic Regression

Training moved offline to Python:

Model Accuracy Dark recall Hopeful recall Cynical recall
TF-IDF + LogReg 54% 43% 55% 64%

Better. Still not good enough.

But here’s the key decision that shaped everything else: inference would not run on a Python server. The model would export its weights as JSON, and a pure-JS implementation would handle TF-IDF vectorization, logistic regression softmax, and probability scoring. All at the edge.

No Flask. No FastAPI. No server.

That constraint forced the interesting engineering work.

The Three-Layer Hybrid System

The final architecture is layered by confidence, not complexity. Each layer only activates if the previous one can’t resolve with enough certainty.

fetchTone(text)
      │
      ▼
Layer 1 — Embedded Lookup (MiniLM)
      │
      ▼
Layer 2 — TF-IDF + LogReg (Cloudflare Worker)
      │
      ▼
Layer 3 — Deterministic Rule Fallback

Layer 1 — MiniLM Semantic Lookup (Client-Side)

Rather than calling an API for every request, all 155 labeled quotes are pre-embedded using all-MiniLM-L6-v2 (384-dim) and injected directly into index.html as EMBEDDED_LABELS.

If the incoming quote exists in the corpus:

Model Accuracy Dark recall Hopeful recall Cynical recall
MiniLM + LogReg 67% 36% 91% 79%

67% sounds modest, but on a 155-sample, 3-class ambiguous dataset — where the classes genuinely overlap — this is the expected ceiling before adding more labeled data. The fallback chain handles the rest. This layer deliberately trades page weight for latency elimination on known quotes.

Layer 2 — Edge Inference (TF-IDF Worker)

If a quote isn’t in the embedded lookup, the frontend sends a POST /tone to a Cloudflare Worker.

Inside the Worker:

If cynical probability ≥ 0.35 → classify cynical. Otherwise return the full probability vector.

No server round-trip beyond the Worker. No Python runtime anywhere in the inference path.

Layer 3 — Deterministic Rule-Based Fallback

If the Worker fails, hits a rate limit, or returns max_prob < threshold, the system falls back to CP77-specific word lists with contrast detection: hopeful + dark signals → cynical boost.

This layer lives entirely in the frontend. No deploy required. No dependency. No AI cost.

It exists for one reason: a request must never return without a label. Correctness floors matter.

Ambiguity Is a Feature

Most classifiers silently return argmax.

This one doesn’t.

If the top two probabilities fall within threshold, the UI shows both:

DARK | CYNICAL

Low confidence is information. A quote that reads dark and cynical should say so. This small design decision makes the system feel more honest — and prevents false certainty.

The Real Engineering Problem (Revisited)

How do you train in Python and infer in a zero-dependency JavaScript edge environment with no server in between?

Key pieces that made it work:

Training lives offline. Inference lives at the edge. The two communicate only through serialized weights.

Failure Mode Thinking

Failure Likelihood Impact Mitigation
Worker inference error Low Moderate Layer 3 fallback
Rate limit exceeded Low–Medium Low Silent fallback
Weight/vocab drift Medium High Dimension validation on startup
Dark misclassification High Low Dual-label ambiguity display
Build injection failure Low High Token validation during build

Dark recall remains low by design. Dark quotes are linguistically sparse — they lack contrast markers and resist surface features. Improving that requires more labeled data, not architectural tweaks.

What This Demonstrates

This is a small system with narrow scope. But it demonstrates:

And it shows one more thing: this project started as a MERN quote generator during #100DaysOfCode. It ended as a hybrid, edge-inferred ML system with layered guarantees.

Different UI. Completely different architecture underneath.

Sometimes you only see the journey looking back.