F# workflow for working with Twilio

tl;dr: using workflows and F# agents for request/response APIs, and using history replay for resilence and scalability

This post is part of the 2018 F# Advent calendar blog series done every year by the F# community, and is a follow up to my post from last year, Answering the phone, functionally (and probably won't make much sense if you've not read that post).

Last year I showed an F# computation expression / workflow DSL for handling an automated phone system. A similar approach would be ideally suited for implementing an Alexa Skill.

I was planning to implement the previous workflow in the new serverless Azure Durable Functions, allowing my santa phone system to run on the serverless platforms and scale easily.

However looking at Durable Functions (DFs) in more detail, I came to the conclusion they weren't a good fit for what I was trying to acheive, so I attempted to shamelessly copy some ideas from DF, in a rush, in an attempt to meet the blog deadline 😄

Once of the nice things about using a workflow in the first place is it's interpretation is completely decoupled from its logic, so running it using a different approach didn't require any changes to the core code.

The code is this article is based on my Twilio integration, but the ideas all translate to Alexa and other similar products. I setup an example Alexa skill as a learning exercise, designed to encourage my two youngest boys it was time to go to bed.

The first thing is to register with AWS on their Alexa dev site, setup a new skill, and enter the start words Alexa will recognise to activate your skill. So in the example below, if I say to Alexa "tell the boys", this skill will be activated.

Alexa start

It's then a case of setting up particular intents, which are the individual features the skill can provide. In this case I've just setup one, allowing the overable sentance of "tell the boys get ready for bed".

Alexa setup

Then just configure the endpoint AWS will call once its recongised a skills+intent combination:

Alexa api

A simple F# azure function that tells Alexa to speak a sentance, and then plays comedy sound effect, very straight forward.

#r "System.Net.Http"
#r "Newtonsoft.Json"

open System.Net
open System.Net.Http
open Newtonsoft.Json

type Named = { name: string }
type OutputSpeech = { ``type``:string; ssml:string }
type Card = { ``type``:string; title:string; content:string }
type AlexaResponse = { outputSpeech: OutputSpeech; card: Card }
type AlexaWrapper = { version: string; response: AlexaResponse }

let reply() =
    {
        version = "1.0"
        response = {
            outputSpeech = {
                ssml = "<speak>Adam and Ben can you <emphasis level='strong'>PLEASE</emphasis> get ready for bed <audio src='https://s3.amazonaws.com/ask-soundlibrary/impacts/amzn_sfx_punch_01.mp3'/></speak>"
                ``type`` = "SSML"
            }
            card = {
                title = "The House"
                content = "Welcome to trying to control alex and tom"
                ``type`` = "Simple"
            }
        }
    }

let Run(req: HttpRequestMessage, log: TraceWriter) =
    let json = JsonConvert.SerializeObject(reply()
    new HttpResponseMessage(HttpStatusCode.OK, Content = new StringContent(json, System.Text.Encoding.UTF8, "application/json"))

This was just a simple end-to-end test, Alexa skills can be much more complex with a back and forth (i.e. the sort of things the F# workflow is designed to make easy to write). The API reference contains all the details.

Quick intro to Durable Functions

Durable functions are an add-onto Azure functions, a way of dealing with State in a serverless environment. The idea is to have long lived 'orchestration' durable functions which can handle complex patterns of logic that can persist over minutes/hours/days (another common term for this would be 'Sagas').

They can kick off other pieces of logic in normal serverless functions, and then go to sleep (i.e. no cost) until the other functions are finished. They can do sequences of calls, parallel calls, or waiting on 'events' from other systems.

DF fan pattern

The take advantage of the async/await features of both C# and Javascript to make the code very linear and readable dispite being highly asychronous, e.g.

public static async Task<Status> Run(DurableOrchestrationContext ctx)
{
    var x = await ctx.CallActivityAsync<Record>("LoadFromDatabase");
    var y = await ctx.CallActivityAsync<Answer>("RunCalculations", x);
    var z = await ctx.CallActivityAsync<RecordID>("SaveToDatabase", y);
    return  await ctx.CallActivityAsync<Status>("EmailReport", z);
}

or the equivalent in Javascript:

const df = require("durable-functions");

module.exports = df.orchestrator(function*(ctx) {
    const x = yield ctx.df.callActivity("LoadFromDatabase");
    const y = yield ctx.df.callActivity("RunCalculations", x);
    const z = yield ctx.df.callActivity("SaveToDatabase", y);
    return yield ctx.df.callActivity("EmailReport", z);
});

The clever thing to me is how durable functions are resurrected after being put to sleep via await (i.e. being shut down half way through their lifetime).

Behind the scenes the system persists any communication between a durable function and the outside world, and when it needs to wake up, rather than having frozen and stored the progress of the durable function itself, it uses an event source approach to resurrect the durable function back to where it was. The previous calls it did are played back to the code of the DF to get it back to the status it was in before the sleep. The events are stored in Azure Storage, in a JSON format. The nice thing is that the 'resurrected' DF could be on a completely different machine on a different contintent, it makes no difference (which is obviously essential for serverless) and also nicely solves any reliability concerns from servers falling over etc.

In the examples above, whilst the DF is awaiting on await ctx.CallActivityAsync<object>("RunCalculations", x); it will be shut down completely. Once the RunCalculations has returns a result, the DF will be recreated, replayed (so the result of await ctx.CallActivityAsync<object>("LoadFromDatabase") will come from the event store, not actually a real call to the database), and then the code will carry on with the new result from RunCalculations.

It is also possible to tweak the time a DF stays alive before sleeping, for keeping 'hot' durable functions for a certain period where you expect the awaits to only be for a short period of time, for efficiency.

A key thing for this to work, is that the workflow be deterministic. Replays have to get back to exactly the same spot, so the code must be written to be always do the same thing given the same data (so even DataTime.Now is out ) - see details on ensuring determinism.

Why DFs didn't work for voice on Twilio

I started off planning to implement the workflows inside Durable Functions, but during the implementation, and after some serious head scratching, I came to the conclusion what I was trying to acheive wasn't going to work.

The essential problem is that durable functions are designed to handle async requests and responses (by piggybacking on Task), and not the syncronous request/response we need for Twilio (or Alexa) 🙁

Sad Buddy Elf

Although you can kick off a new durable function from an incoming HTTP request, or send new events to it once it is running, you can't really get any sync response back from the DF until it has completed. You can fake it using the DF custom status feature and busy waiting, but for the moment it's a bit of a hack, and the busy wait is going to increase your costs, so not really a good fit for what I was exploring.

There is already discussion on the github repo about changes to support this, but this is really moving the DFs toward a full agent system, which is a much more complex undertaking, and Microsoft already has two other agent frameworks ( Orleans and Service Fabric ) already.

Durable Functions would have been ideal if sync resposnes weren't required, a workflow based on SMS messages would have been straight forward, in fact Microsoft even have an example using SMS via Twilio.

Switchboard and CallHandlers

I've decided to host the individual calls using F# Agents ( aka MessageBoxProcessor ), called CallHandlers. Agents scale well by using the underlying Task Pool, they manage state using tail recursive async calls, and the avoid threading issues by handling messages in strict order.

I've also created a Switchboard agent which has the responsiblity of tracking in-progress calls. In the code, the agents are wrapped in objects to hide away the implementation.

The diagram below shows the calls from incoming HTTP calls (in blue), being routed through the switchboard and then processed by the individual call handlers (in yellow).

Switchboard model

One interesting feature is how the switchboard can pass the AsyncReplyChannel straight through to the call handler. I copied this idea from @theburningmonk from his NDC 2016 talk discussing a similar approach.

NDC talk

In the sequence diagram below, you can see the CallHandler responding directly to the HttpEndpoint, not needing to go via the Switchboard.

seqeuence in memory

The logic of the switchboard is captured by the messages the agent handles. It is generic, requiring the call ids be comparable, and that the handles support disposal.

/// The protocol definiton for interacting with the switch board agent.
/// This is essentially a thread safe Map<'Id,'Handlers>
type SwitchBoardProtocol<'Id,'Handler when 'Id: comparison and 'Handler :> IDisposable> =
| StoreHandler     of id:'Id * handler:'Handler
| FindHandler      of id:'Id * reply: AsyncReplyChannel<'Handler option>
| RemoveHandler    of id:'Id    // This is also responsible for disposing the object
| RequestOfHandler of id:'Id * toRun:( Result<'Handler,string> -> unit )

The last message, RequestOfHandler is there to support directly passing a message through to the call handler.

The call handler messages are even simpler:

type CallDetails = {  NumberFrom:string; NumberTo:String; CallAt: DateTime }

type CallResponse =
| CompletedResponse of PhoneCommand
| IntermediateResponse of PhoneCommand

type CallHandlerMessages =
| CallStarted of CallDetails * AsyncReplyChannel<CallResponse>
| DigitsProvided of string * AsyncReplyChannel<CallResponse>

When a call comes in, we post a CallStarted to the call handler and return the response. Any further calls are digits being entered, and they are sent to the call handler via DigitsProvided repeatadly until the call handler lets us know the call has finished via a CompletedResponse.

Copying the state replay

This all works well, so the next step is to consider how would we do if we had lots of simulationous calls, but big delays between key presses etc. ? The current setup would have all the 'in flight' call handling programs in memory. If they are complex, or have lots of memory in scope, we might run out of resources.

So instead I've attempted to copy the approach of saving state to allow in progress calls to be resurrected and 'replayed'.

sequence with replay

To do this, we add a new message type to the call handler, allowing peristence to be requested.

type internal CallHandlerMessages =
...
| PersistToStorage

We also need to make sure the state is captured as we go, so the internal state of the call handler agent contains the current program (if we have one, we may not if we've just started or we have been shutdown), as well as a list of all digits we have been sent by the caller.

type internal CallHandlerState = {
    CurrentProgram: (string -> Workflow.CallHandlingProgram) option
    Digits: string list
} with static member NotStarted = { CurrentProgram = None; Digits = [] }

Then the code required to replay history is suprisingly easy.

let resurrectViaReplay digitHistory =
    let replayError _ = Workflow.Complete (Workflow.Hangup "Failed to replay history")
    let state = CallHandlerState.NotStarted
    let rec stepDigits keyPresses program =
        match keyPresses with
        | []         -> program
        | key::other -> match (stepProgram ignore state (program key)).CurrentProgram with
                        | None -> replayError
                        | Some s -> stepDigits other s
    let nextStep = match (stepProgram ignore state <| getProgram()).CurrentProgram with
                    | None -> replayError
                    | Some p -> p
    nextStep |> stepDigits (List.rev digitHistory)

Note we need to detect replay errors where we have a situation like the phone program says it has completed but we still have key presses to replay, this is likely caused by non-determinism in the system.

With this in place, the main agent loop for the call handler is changed to wait a certain period of time for a message. With no message arriving, or when explicitly told to persist, we release the call handling program. Next time it is required, it is then recreated via resurrectViaReplay.

let rec messageLoop (state: CallHandlerState) = async {
    let! message = inbox.TryReceive(int persistAfter.TotalMilliseconds)
    let newState =
        match message with
        | Some (CallStarted(caller)) ->
            stepProgram caller.Reply state <| getProgram()
        | Some (DigitsProvided(digits,caller)) ->
            let program = match state.CurrentProgram with
                          | Some program -> program
                          | None   ->       resurrectViaReplay state.Digits
            let state' = stepProgram caller.Reply state <| program(digits)
            { state' with Digits = digits :: state'.Digits }  // store for later replay if needed
        | Some PersistToStorage | None ->
            // production version: persist state.Digits to secure storage
            { state with CurrentProgram = None  }
    return! messageLoop newState }

Just like durable functions, we need to be careful on non-deterministic calls, with all messages coming into the CallHandler from the outside world needing to be stored. In pure languages like Haskell this would be enforcable in the type system, but in F# we just need to be careful.

Currently the call handling programs don't need to do any other calls (e.g. to lookup a database record), but if that is required, we need to make sure we add a feature to:

Pass into the call program a standard wrapper around non-determistic calls, that memorises the response (in the same way we cache the digits pressed by the user)
Have the memorised response persisted into the state
Use it in replay like we do the digits

I was hoping to add this in, but have run out of time unfortunately, but it's straight forward to do.

The full code, with some basic tests, can be found on Github or here

Future improvements

Have the switchboard be able to store all its own state as well, and restart from complete shutdown.
Have a similar timeout features in CallHandler that the caller has gone away, and clears up fully, removing itself from the Switchboard.
Add a workflow and primatives for Alexa skills.

Conclusions

F# Workflows are great for modelling request/response systems like Alexa and Twilio (and you can use them in C# too)
Agents are very useful for encapsulating pieces of seperate state, and their features around waiting for a new message with a timeout is also very useful
Pure functions make it possible to efficiently 'rehydrate' state via event sourcing of data, making it easy to have systems that can be shut down and started back up, without losing our nice linear programming model.

Happy Christmas again !

References and more information

santa

Answering the phone, functionally, again