Answering the Call to Help Telenor Launch Their “Missed Calls” Campaign – Part 2

Business&Products

Answering the Call to Help Telenor Launch Their “Missed Calls” Campaign – Part 2

April 27, 2026

May 29, 2026

Rafał Rybarczyk

Memory Shared series badge

Memory Shared is a new series of articles where we explore some of the most interesting Case Studies of client projects. Like actual memories, each project memory we share is multilayered so we've decided to make each memory a two-parter. Part 1 explores the business context and product strategy behind each project, while Part 2 explores the technical solutions we came up with featuring deep dives into the frameworks, work-arounds and coding heroics of our team. Enjoy the memories and don't hesitate to share!

Table of Contents

“Missed Calls” Project Overview
Answering the Call: Understanding the Problem
Robocall of Duty: Pinpointing Our Solution
After the Beep: Distributing Outbound Voicemail Calls
Please Leave a Message: Turning Call Recordings into Individual Voicemail Files

In Part 2 of Memory Shared: Telenor I’m excited to dive a bit deeper into the technical side of the solution we came up with for the Missed Calls campaign. As you may remember from Part 1, Nord DBB approached us with a deceptively simple request: build a webpage to let their mobile carrier client Telenor’s mobile users enter their phone number and voicemail PIN and then retrieve their voicemail messages as downloadable audio files. On paper it seemed simple, but if you’ve ever tried to use a mobile voicemail system, you’ll understand what happened next.

We also recommend reading the article about business side of the project, which is available here:

You may want to check our other articleAnswering the Call to Help Nord DDB Launch Their “Missed Calls” Campaign – Part 1

“Missed Calls” Project Overview

Client: Telenor & Nord DDB
⁠Year: 2024
⁠Project Description: A Frontend Web App and Backend Voicemail API to record Voicemail Messages for Telenor Sweden Users
⁠Team Involved: Tech Lead, Frontend Developer, Backend Developer, Project Manager
⁠Technologies: node.js, MySQL, custom Voice Interface Integration (Telnyx), React.js

Answering the Call: Understanding the Problem

Once we were fully onboarded to the project we dug into the technical documentation and discovered that Telenor exposed no API, no documentation, and no programmatic access whatsoever. The only interface available was a phone number that users could dial before navigating a Swedish-language Interactive Voice Response (IVR) system and press the digits corresponding to the function. That was the entire integration surface.

Telenor phonecards

Before we could even think about authentication or message retrieval problems, we had to answer a more basic question: Does every phone number have a voicemail inbox? There was no endpoint to check. The only way we could find out was to behave like a human caller. So we started building what we ended up calling a “Probe Call.” In order to get our probe call system off the ground we needed to figure out how we could place and control phone calls from a backend service? We realized that we needed more than just outbound calling functionality so we gathered the specs and requirements that our probe call system would need.

Requirements & Functionality of our Probe Call System:

Programmatically dial numbers and send DTMF tones
Receive real-time webhooks for call state changes
Stream live transcription to infer IVR state
Record calls for later audio processing
Operate within EU data residency requirements

We knew it would be too challenging to develop such a sophisticated telephony solution in house, so we started testing a few third party solutions and we settled on Telnyx as our telephony provider. Telnyx offered a combination of features that mapped directly to our needs: programmable voice with fine-grained call control, reliable webhook delivery, real-time transcription, and built-in call recording — all accessible through a straightforward Node.js SDK. Just as importantly, it provides EU-based infrastructure, which simplifies compliance concerns around handling user data.

A typical voicemail call flow is driven almost entirely by webhooks — the API initiates the call, and everything that follows (answer detection, transcription events, recording availability) arrives asynchronously. That model shaped much of our system’s architecture. From there, the system reacts to events like call.answered, call.machine.detection.ended, call.transcription, and call.hangup — effectively turning every phone call into a distributed, event-driven workflow.

js

interface CreateCallParams {
  callId: string;
  from: string;
}

async createVoicemailProbeCall({ callId, from }: CreateCallParams): Promise<string> {
  const request = {
    connection_id: this.configService.get<string>('telnyx.connectionId'),
    to: this.configService.get<string>('telnyx.voiceMailPhoneNumber'),
    from,
    answering_machine_detection: 'detect_words',
    webhook_url: `${this.configService.get<string>('telnyx.webhookUrl')}/webhook/${callId}`,
  };

  try {
    const response = await this.client.calls.create(request);
    const callSessionId = response.data.call_session_id;

    this.logger.log({
      event: 'call.created',
      provider: 'telnyx',
      callId,
      callSessionId,
      from: this.maskPhoneNumber(from),
    });

    return callSessionId;
  } catch (error) {
    this.logger.error({
      event: 'call.failed',
      provider: 'telnyx',
      callId,
      from: this.maskPhoneNumber(from),
      error: error?.message,
    });

    throw new Error('Failed to initiate voicemail probe call');
  }
}

In effect, our system dialed the carrier’s voicemail number via Telnyx, entered a target phone number, and listened to the response. If the system responded with a prompt asking for a PIN, we knew the voicemail box existed. If it hung up or rerouted elsewhere, it didn’t. In practice, this required wiring together call control, real-time transcription, and a small state machine driven entirely by asynchronous webhooks.

Probe Call System Webhooks:

The call is initiated without a PIN
Once the call is answered, we start real-time transcription
We send only the phone number via DTMF
A transcription listener watches for phrases equivalent to “enter your PIN”
If detected, the call is marked as COMPLETED — voicemail exists
If the call ends before that, we treat it as NO_VOICEMAIL

This webhooks system may look and sound deterministic. But it wasn’t. The webhook events arrived out of order. Transcription lagged behind the audio. Sometimes the hangup event arrived before the phrase we were waiting for. To handle this, the hangup handler deliberately waited before making a final decision – a small delay that gave the transcription pipeline time to catch up and resolve the race condition.

Even detecting a simple phrase like “enter your PIN” turned out to be unreliable.

Robocall of Duty: Pinpointing Our Solution

The Telenor voicemail system operated exclusively in the Swedish language, and real-time speech-to-text output was inconsistent enough that exact matching wasn’t a viable solution. The same phrase could appear in dozens of slightly different forms depending on timing, audio quality, and model interpretation.

So we ended up building a two-stage matching layer which included:

A dictionary of known phrase variations (dozens of key prompts)
A fallback using Levenshtein distance similarity with a tuned threshold
A sliding-window scan that prioritized the most recent transcript fragments, where state transitions were most likely to occur

In other words, we weren’t just listening for words – we were inferring system state from the imperfect signals of a live phone call. Only after confirming that a voicemail inbox existed were we able proceed with user authentication.

At that point, the flow became more conventional. The user verified ownership of the phone number via a one-time password sent over SMS using Telnyx. This established the first trust boundary: proving that the caller controls the number they’re attempting to access.

js

interface SendOtpParams {
  to: string;
  code: string;
}

async sendOtp({ to, text }: SendOtpParams): Promise<void> {
  const request = {
    from: this.configService.get<string>('telnyx.fromSender'),
    to,
    text,
    messaging_profile_id: this.configService.get<string>('telnyx.messagingProfileId'),
  };
  try {
    const response = await this.client.messages.create(request);
    this.logger.log({
      event: 'sms.sent',
      provider: 'telnyx',
      messageId: response.data.id,
      to: this.maskPhoneNumber(to),
    });
  } catch (error) {
    this.logger.error({
      event: 'sms.failed',
      provider: 'telnyx',
      to: this.maskPhoneNumber(to),
      error: error?.message,
    });
    throw new Error('Failed to send verification SMS');
  }
}

Now at this point, two things were true: the phone number had an active voicemail box, and the user had verified ownership of that number via SMS. What remained was the part that actually unlocked access – the voicemail PIN.

Without the PIN, the system was able to dial into the carrier’s IVR, but it couldn't progress past the first prompt. With it, we gained full access to a user’s private voice messages. That made the PIN one of the most sensitive pieces of data in the entire system. Unlike an OTP, which expires in minutes, a voicemail PIN is a long-lived secret tied directly to personal communication. Mishandling it would effectively expose a user’s private messages. Because of that, we intentionally constrain how we handled the PIN.

Rather than transmitting the PIN in plaintext or relying on naive encryption, we used a hybrid encryption scheme. The client encrypted the PIN using a one-time symmetric key (AES-256-GCM), which was then encrypted with the server’s public key. The server stored only the encrypted payload.

At call time, the PIN was decrypted in memory, sent as DTMF tones, and immediately discarded. It was never logged, never persisted in plaintext, and existed only for the brief moment it was needed. The guiding principle was simple: minimize the lifetime and exposure of sensitive data. Encrypt at the boundary, decrypt only at the moment of use, and never store what you don’t need. With voicemail existence verified, identity confirmed, and credentials handled safely, the real challenge began.

Telenor phonecards

After the Beep: Distributing Outbound Voicemail Calls

Database-backed Call Distribution

One practical problem appeared before any transcription or audio processing: outbound calls could not all originate from a single number. At low volume, using one caller ID works. At higher volume, repeatedly dialing the same carrier voicemail system from the same originating number increases the risk of rate limiting, call filtering, or degraded reliability. We needed a way to spread active calls across a pool of outbound numbers.

We solved this problem at the application layer. Each new call queried the database for numbers currently in use, counted the number of active calls assigned to each, and selected the least-loaded candidate. In practice, this was enough: simple, predictable, and with no additional infrastructure needed to operate.

sql

SELECT callerPhoneNumber, COUNT(id) AS count
FROM calls
WHERE status IN ('DIALING', 'ANSWERED', 'PROCESSING')
GROUP BY callerPhoneNumber
ORDER BY count ASC;

This was not sophisticated scheduling, but it didn’t need to be. The goal was to avoid concentrating traffic on a single outbound number. For our workload, a small amount of database-driven balancing was sufficient.

Turning a Phone Call into a State Machine

Distributing calls across outbound numbers solved one operational problem. It did not solve the harder one: once the call connected, the entire system had to infer state from a stream of asynchronous webhook events and imperfect live transcription.

At that point, a phone call stopped being “just a call” and became an event-driven workflow.

That workflow had to answer questions like:

Has the carrier answered?
Are we speaking to a machine or a human-operated IVR?
Has the system asked for the PIN yet?
Was the PIN rejected?
Are there new messages?
Are there no more messages?
Has the call ended normally, or did it fail before we understood why?

None of the answers to these questions came from a single reliable source. They had to be reconstructed from webhook timing, DTMF actions, and noisy Swedish speech-to-text output.

A Webhook-driven Call State Machine

Telnyx exposes call lifecycle events as webhooks: call.answered, call.machine.detection.ended, call.transcription, call.recording.saved, and call.hangup.

That sounds straightforward until you build on top of it.

As I previously mentioned, these events are asynchronous, can arrive close together, and do not always line up neatly with application expectations. The system therefore treats the call as a state machine, with each webhook advancing or resolving a part of the flow.

At a high level, the service does four things:

Verifies webhook authenticity
Loads the current persisted call state
Dispatches handling based on event type
Updates call status and triggers follow-up actions

A simplified version looks like this:

ts

async webhook(
  webhookRawBody: any,
  webhookTelnyxSignatureHeader: string,
  webhookTelnyxTimestampHeader: string,
  callId?: string,
) {
  const event = this.client.webhooks.constructEvent(
    webhookRawBody,
    webhookTelnyxSignatureHeader,
    webhookTelnyxTimestampHeader,
    this.configService.get('telnyx.publicKey'),
  );

  const call = await this.callsRepository.findOne({
    where: { id: callId },
  });

  switch (event.data.event_type) {
    case 'call.answered':
      return this._callAnsweredEvent(event, call);
    case 'call.machine.detection.ended':
      return this._callMachineDetectionEndedEvent(event, call);
    case 'call.recording.saved':
      return this._callRecordingSavedEvent(event, call);
    case 'call.transcription':
      return this._callTranscriptionEvent(event, call);
    case 'call.hangup':
      return this._callHangupEvent(event, call);
  }
}

The important part is not the switch statement itself. The important point is that each branch operates under uncertainty. A call can already be ending while transcription is still catching up. A recording webhook can arrive after the call state has already been marked as failed. A phrase indicating success can appear seconds before or after the hangup event that contradicts it.

That is why almost every transition has to be defensive.

Call Control through DTMF & Live Transcription

Once machine detection was completed, the system had to decide what type of call it was handling. For a voicemail existence check, it started the transcription and sent only the phone number. For a full retrieval call, it started the transcription, began recording, decrypted the user’s PIN, and then sent both the phone number and PIN as a DTMF.

ts

private async _callMachineDetectionEndedEvent(event: any, call: Calls) {
  await this._transcriptionStart(event.data.payload.call_control_id);

  if (call.type === CallType.CHECK_PHONE_NUMBER) {
    await this._typePhoneNumberOnly(
      event.data.payload.call_control_id,
      call.user.phoneNumber,
    );
  } else {
    await this._recordingStart(event.data.payload.call_control_id);

    await this._typePhoneNumberAndPin(
      event.data.payload.call_control_id,
      call.user.phoneNumber,
      this._decryptPin(call.pin),
    );
  }

  await this.updateCallStatus(call.id, {
    status: CallStatus.ANSWERED,
  });
}

The DTMF sequence itself had to be tuned empirically. The carrier IVR was sensitive to timing, so sending digits too early or too quickly would fail unpredictably. In practice, the most reliable pattern was to add explicit wait intervals before entering the phone number and then once again before entering the PIN.

ts

private async _typePhoneNumberAndPin(
  callControlId: string,
  phoneNumber: string,
  pin: string,
) {
  return this._dtmf(callControlId, `WW${phoneNumber}#WWW${pin}#1`);
}

This kind of logic looks fragile because it is fragile. It is automation against a human-oriented IVR, so timing becomes part of the protocol whether you like it or not.

Swedish Speech Recognition Required a Fuzzy Matching Layer

Live transcription was essential because the entire call flow depended on understanding what the carrier’s voicemail system was saying in real time. In theory, that sounds like a straightforward speech-to-text problem. In practice, it absolutely wasn’t.

The prompts were all in Swedish, delivered over phone audio, and transcribed live while the call was still in progress. The result was noisy enough that exact string matching failed almost immediately. The same phrase could be rendered in numerous different ways depending on timing, line quality, speaker cadence, and model behavior. Instead of treating transcription output as truth, we had to treat it as a noisy signal.

Missed Calls Home Screen

To make it usable, we built a matching layer with three components:

A dictionary of known variants for key prompts
Exact matching for the obvious cases
Levenshtein similarity as a fallback for imperfect matches

We also biased matching toward the end of the transcript first, since the most recent words were usually the most relevant to the current state of the call.

This gave us a reliable way to infer intents such as:

Enter PIN
Wrong PIN
No voicemail
No new messages
No more new messages
New message

But this is where things get counterintuitive. We didn’t just use transcription for observability or debugging. We used it to control the call itself.

Transcription was not just Metadata - it drove call control.

The transcription stream became the decision engine for the entire system.

Every detected phrase could immediately trigger a state transition or a side effect:

typePin during a probe call → voicemail exists
wrongPin → mark call as failed and hang up
youHaveNoNewMessage → stop processing early
youHaveNoMoreNewMessage → finish and optionally trigger follow-up logic
newMessage → emit a marker used later in audio processing

This creates an unusual dynamic: the system is making real-time decisions based on data it does not fully trust.

That’s why the fuzzy matching layer isn’t just a convenience — it’s a safety mechanism. Without it, small transcription inconsistencies would translate directly into incorrect behavior. At this point, the system stops looking like a conventional backend service and starts behaving more like a control loop:

listen → interpret → act → repeat

Each step depends on imperfect input, and each decision affects what happens next in a live phone call.

When a Hangup isn’t the End of the Story

One subtle problem appeared during voicemail detection. If the probe call ended before the system heard a “type your PIN” prompt, the natural conclusion was that the number had no voicemail box. But in real calls, the final transcription event and the hangup webhook could arrive almost simultaneously.

That created a race: the call might actually be valid, but the hangup handler could mark it as NO_VOICE_MAIL before the transcription handler had time to mark it as successful.

The fix was simple and very intentional: do not decide immediately.

Please Leave a Message: Turning Call Recordings into Individual Voicemail Files

Recording the voicemail session was only the first step. The real goal was to turn that single call recording into something the user could actually use: one clean audio file per message, with the carrier’s IVR prompts removed. At first, transcription looked like the obvious way to do this.

In theory, the live transcription stream contained all the markers we needed - phrases like “next message”, “received at”, and other prompts that separate one voicemail from the next. But in practice, transcription was the wrong source of truth for audio segmentation. There were two reasons.

First, recording delivery and transcription events were asynchronous. By the time a transcription fragment arrived, we knew what was said, but not with enough precision to map it back to an exact timestamp in the final recording. The only reliable timestamp we had was when the webhook reached our system, not when the phrase occurred in the call audio.

Second, we deliberately chose not to persist transcription data. The recordings could contain highly sensitive voicemail content, and we wanted to minimize the amount of derived user speech data we stored. That decision improved privacy, but it also meant transcription could not serve as a durable indexing layer for later processing.

So instead of treating transcription as the source for message boundaries, we moved the problem into the audio domain.

Audio Fingerprinting Instead of Transcript Timing

The breakthrough was a small .NET service built on top of the SoundFingerprinting library.

Rather than trying to reconstruct message boundaries from webhook timing, we created a set of short audio samples for the carrier phrases that reliably appear around message transitions. These included prompts such as:

Next message

No more messages

No more new messages

“Received at” variants for hours 00-23
⁠

“Received on"

“Received yesterday”

These samples were fingerprinted in advance and stored in memory. When a full voicemail recording arrived, the service scanned it for matching audio signatures and returned approximate positions for those known prompts.

This gave us something transcription could not: markers anchored directly to the recording itself.

At startup, the fingerprinting service loads a library of short carrier prompt samples, hashes them once, and stores the resulting fingerprints in memory for later matching.

csharp

private static async Task StoreForLaterRetrieval()
{
    foreach (var audioFilePath in Directory.EnumerateFiles(IntroAudioFolder, "*.wav"))
    {
        await RegisterSampleAsync(audioFilePath);
    }
}

private static async Task RegisterSampleAsync(string audioFilePath)
{
    var fileName = Path.GetFileNameWithoutExtension(audioFilePath);
    var isNewMessageMarker = fileName.Contains("togsimot", StringComparison.OrdinalIgnoreCase);

    var track = new TrackInfo(
        id: BuildSampleId(fileName),
        title: isNewMessageMarker ? "new-message" : fileName,
        artist: audioFilePath
    );

    var fingerprints = await FingerprintCommandBuilder.Instance
        .BuildFingerprintCommand()
        .From(audioFilePath)
        .UsingServices(audioService)
        .Hash();

    modelService.Insert(track, fingerprints);

    Console.WriteLine($"Registered fingerprint sample: {track.Title} ({track.Id})");
}

private static string BuildSampleId(string fileName)
{
    return fileName
        .Replace("togsimot", "", StringComparison.OrdinalIgnoreCase)
        .Trim('_', '-', ' ');
}

Fingerprints found the prompts. Silence found the cuts.

Fingerprinting alone was not enough to extract usable voicemail files.

A prompt like “next message” tells you roughly where a message begins, but not where it should be cleanly cut. To solve that, we combined fingerprint matches with silence detection from FFmpeg.

FFmpeg scans the waveform for silence intervals:

shell

ffmpeg -i input.wav -af silencedetect=n=-50dB:d=0.4 -f null -

We parse the resulting silence_start and silence_end timestamps and use them as natural cut points.

The pipeline worked like this:

Download the call recording
Fingerprint it against known IVR prompt samples
Run silence detection across the full waveform
Align each detected prompt with the nearest silence boundaries
Trim the audio into one file per voicemail
Upload the final clips to S3

The key idea is simple but important:

Fingerprinting tells us what happened and roughly when
Silence detection tells us where to cut cleanly

Together, they turn one continuous recording into structured, per-message audio - without relying on fragile transcription timing.

Handling Noisy Matches & Overlapping Samples

The real recordings weren’t clean. Some prompts overlapped. Some short variants matched close to longer ones. Others appeared within seconds of each other and would create duplicate or incorrect boundaries if left untreated.

So we added a normalization layer:

sort matches by timestamp
deduplicate within a tolerance window
prefer longer, more informative samples over short variants
ignore anything that isn’t a message boundary

This logic isn’t particularly elegant, but it’s what made the output usable in production.

Why this approach worked

This ended up being a much better fit than transcript-based segmentation.

The transcription was still useful for real-time control during the call, but it was too asynchronous and too imprecise to cut the final audio reliably. Fingerprinting and silence detection, by contrast, worked directly on the recording we were actually delivering to the user.

That separation turned out to be important:

transcription was for live decision-making
audio fingerprinting + silence detection was for post-call media extraction

Once we split the system that way, the pipeline became much more deterministic.

Rafał Rybarczyk

Rafał Rybarczyk

Chief Technology Officer

TRUSTED BY STARTUPS AND ENTERPRISES IN 16 COUNTRIES WORLDWIDE 🌍

Top Clutch 2024

Clutch Champion 2024

Clutch Nominee

They are a very agile organisation. Even when faced with complex problems finds solutions and presents realistic plans that it then sticks to.

Paweł Kaczyński

Paweł Kaczyński

Innovation Manager - PHD Media

We are sincerely happy to have found Memory Squared. Their structured and professional way of handling a variety of projects felt to us like they were based around the corner.

Jesper Voois

Jesper Voois

Project Manager - Clear Timber Analytics

Memory Squared has provided us with strong specialists in product design and development. They're a proactive team that integrates smoothly and has strong organisational skills.

Chris Gibb

Chris Gibb

CEO - Iron Code

I’ve had experience working with development companies in European countries, and none of them has come close to the work Memory Squared has done.

Rafael Pedroso

Rafael Pedroso

Creative Lead - Digital Fans

They delivered new features as well as supported us in testing and bug fixing. They take care of the tasks immediately and are quick to react.

Joshua Sprey

Joshua Sprey

CEO - Splash Games

Memory Squared has delivered all the functionalities we wanted. They've met deadlines and stayed on budget. Regarding feedback, they've been very flexible.

Nicole Rok

Nicole Rok

Head of Product - Go Autonomous

The app was completed on-time following our specifications. They seem to have a great rapport with UI, back-end and front-end developers all working together well.

Lewis Allison

Lewis Allison

CEO - UNA Watch

Memory Squared has gone above and beyond what we could ask for. They're very detail-oriented and do extensive testing to provide feedback.

Jeffrey Stevenson

Jeffrey Stevenson

Founder - Blue Mosaic

Their professionalism was evident as they invested considerable effort in comprehending our data tools, proposing innovative ideas to tackle challenges, and enhancing user flow and product functionality.

Syazana Lim

Syazana Lim

Head of Product - SSON Research & Analytics

Let's talk!

We would love to discuss your project, and share our experience. You can book a quick meeting below or just drop us a line at contact@memory2.co .

What we do

We can cover the whole process and digital product development life cycle but we’re also open to sending our developers and designers on a mission to join your team. We’re comfortable working closely with business managers to creatively place digital products into their strategy.

Custom Software Development Mobile App Development Digital Products Design

AI Implementations IT Outsourcing

Industries and specializations

We build an extraordinary experience in the multiple  industries and niches

Real-World Experience Brands Startups E-commerce B2C Digital Products Business Platforms IoT and Wearables Foodtech Software Automotive Gamifications