I'll seek to give you an introduction to what you'll be looking at while using each verb. For more specific details, you'll want to directly refer to the documentation.
The <Say>
verb invokes Twilio's text-to-speech engine, that is, it gets Twilio to say things:
Inside the <Say>
tag, you provide the text that you want it to speak. In Twilio parlance, what's nested inside the verb is called the noun. In this case, it's just plain text, but for many verbs, it'll be further XML tags.
By setting attributes on the <Say>
verb, we can switch the voice from male to female and can also change the language of our text. The full options are in Twilio's documentation, but it works along these lines:
In our <Say>
verb, we can specify a range of attributes that customize what happens:
There are a few quirks that are worth noting with the <Say>
verb. Take a look:
- As we can construct our TwiML with PHP, it's easy to dynamically generate the text that is spoken, as we did in
call.php
earlier in the chapter. - Always test your
<Say>
verbs well by calling in yourself. Twilio might not always pronounce things perfectly, and you should be especially careful to check the annunciation of numbers, dates, and amounts of money:- A great example is that if we include
1234
—for instance, as a PIN number or password, Twilio will say one thousand two hundred and thirty four, rather than one two three four. If we wanted it to say the latter, we should write 1 2 3 4
, with a space between each number, perhaps also using a <Pause>
verb between numbers to keep it from being read too fast. - With proper nouns, such as place names or products, you might need to be creative in order to have proper pronunciation. One way to do this is to spell things phonetically.
The <Play>
verb lets you play audio. This is useful for things such as holding music and using your own voiceovers where text-to-speech just seems a little too awkward:
Inside the <Play>
tag, provide the URL of the audio file to be played. Twilio supports a number of formats, but you'll almost certainly want to use either MP3 or WAV.
You can also use the <Play>
verb to play DTMF tones (that is, the sound made when you press a number on your phone's keypad) to test other phone systems. We won't cover this, as it's very much an edge use case.
As with the <Say>
verb, the loop
attribute is supported; it works exactly the same as for <Say>
, allowing us to repeat our audio clip as many times as we need or forever.
The most important caveat to remember with <Play>
is that Twilio will cache the audio file you provide. This means that changing voiceovers or hold music is not necessarily as simple as just changing the file where they're stored.
At the same time, Twilio's caching is useful because it will help you save bandwidth and, therefore, cost as well—especially if you're hosting your audio with a provider such as Amazon S3 (http://aws.amazon.com/s3/)
Twilio will obey the standard HTTP cache headers to decide when to re-download your audio file and when to keep using a copy it has used previously; see https://www.twilio.com/help/faq/voice/how-can-i-change-the-cache-behavior-of-audio-files for details. So, to change audio files, you'll need to do one of the following:
- Wait for the caching period to finish
- Re-deploy your application, pointing to a fresh URL for the audio (for instance, uploading your audio files into directories named with the date of the application version, and then updating all the references in your TwiML)
The <Pause>
verb waits silently for a specified number of seconds, or one second by default. It's simply used like this, with the length of time for waiting specified in the length
attribute:
Note
Note that the <Pause>
verb looks a little different than any other verb because it uses a self-closing tag. It has no noun(s), but it takes an attribute that represents the number of seconds for which you wish to wait. As we'll see later, the <Reject>
verb works in a very similar way.
The <Gather>
verb allows you to take input from a caller by asking them to enter digits on their phone's keypad. This allows you to build complex, interactive applications.
The <Gather>
verb is slightly more complicated to use than the verbs we've seen so far, as you will be nesting other verbs inside it.
So, for example, we might nest a <Say>
verb inside our <Gather>
block to say a message and then wait for the caller's input.
In this example, we say a message and then wait for 10 seconds for the caller to enter a single digit in response:
Once a customer has entered a digit, Twilio will make a POST
request to the action, which is digits.php
, including the digits that the customer entered in the Digits
parameter. This allows you to build awesome interactive applications. Here's an example of what we can do in digits.php
:
Inside our <Gather>
block, we can nest not only <Play>
verbs, but also <Say>
and <Pause>
.
Here, we first check whether the digit 1
has been entered. If it has, we ask Twilio to say a message, and then we add another action afterwards. Here, we use a <Dial>
verb through which we might add the caller to a queue or dial to them through to a particular number.
If a number other than 1
was entered, we play an alternative message.
Tip
When you're using <Gather>
, always test all of the paths through your call flow. This means that you try every option; otherwise, it's easy to not pick up serious errors with your application.
Unsurprisingly, the <Record>
verb lets you record the caller's voice. This is perfect for things such as building a voicemail service, registering participants' names for a conference call, or gathering feedback from users:
In the preceding code snippet, we record for up to 30
seconds, ask Twilio to try to transcribe the audio into text, and then Twilio makes a POST request to recording.php
with the URL of the recorded audio as an MP3.
Note
Note that using Twilio's transcription feature costs $0.05 cents per minute transcribed.
Twilio will expect recording.php
to also return TwiML in order to let it know what to do next. For instance, you might hang up the call or even play back the caller's recording to them for them to check and confirm.
As you're given the recording URL, it's really easy to do all of this and much more, such as storing our recording in a database:
In the preceding code snippet, we take some of the data provided by Twilio in the request (that is, in $_POST
) and store it to variables in order to use it later. We then use this to form a TwiML response
, which plays a message and then plays the caller's recording back to them. It then says goodbye and hangs up.
Tip
The <Record>
verb doesn't support nesting. If you want to record an entire call, which is probably the primary example where you'll want to do something else, the flow is slightly different and forms part of what we'll do with the <Dial>
verb. We'll cover this later.
The <Message>
verb allows you send a text or Multimedia Messaging Service (MMS) message as part of a phone call's flow. Using this verb is simple. On a basic level, you just nest plain text within it, representing the message that should be sent:
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Twilio will automagically, if nothing else is specified, send a text to the current caller with the caller ID as the number that is being called.
Of course, you might want to text someone else. For example, imagine that we have a fault-reporting service where we'll text an available engineer when someone reports a problem:
Here, we play a message to the caller and then send a text to a provided phone number (stored in the $engineerPhoneNumber
PHP variable) with the caller's phone number.
Tip
Most of the time, this won't be necessary, but we can set up a status callback (statusCallback
) on the <Message>
verb to have our application notified as to whether an SMS was successfully sent. For details, see Twilio's documentation at https://www.twilio.com/docs/api/twiml/sms/message.
Using the <Message>
verb, we can also send MMS messages with included images. In order to do this, we'll nest a <Media>
noun with a URL pointing to an image inside our <Message>
verb. To include an image and text, we can nest a <Media>
noun and a <Body>
noun as follows:
Tip
At the moment, MMS messaging is only available with select Twilio phone numbers in the US and Canada, but this will expand in due course.
If you're not interested in sending messages as part of a call, don't worry; we'll cover how to send outbound SMSes via the REST API on an ad-hoc basis in the next chapter.
As part of offering a full suite for building phone services, Twilio supports the queuing functionality that is very popular for use in call centers and similar applications.
Simply nest the name of a queue that the caller should be joined to within the verb:
On the <Enqueue>
verb, we can specify a waitUrl
attribute. This should point to a TwiML that will be repeatedly run through for the caller while they wait in the queue.
This will default to play hold music provided by Twilio, but we can add our own, or even read the caller's position in the queue to them when we specify our own custom file. We can set our own waiting TwiML like this:
Follow this up by writing your own waiting.php
file like this:
Here, we play some of Twilio's wait music (you can see a list of their provided tracks at Link - http://s3.amazonaws.com/com.twilio.sounds.music/index.xml) and then play the caller's position in the queue back to them.
Tip
From our waitUrl
attribute, most TwiML verbs (except <Dial>
verb) are supported. This means that you can do a range of things in the wait process, from playing a message like we did previously to collecting details from the customer with the <Gather>
verb.
A call can be dequeued in three ways:
- By another caller being connected to the call through the
<Dial>
verb's <Queue>
noun - Via the REST API
- With the
<Leave>
verb
We'll cover these in detail later, but for now, let's take a look at the <Leave>
verb.
This verb is a very simple one. It is used from a queue's waitUrl
(see the preceding section), and it lets us remove the caller from the queue and run some alternative TwiML instead.
As a crude example, let's add a caller to our support queue but add a we're now closed
-style message after our <Enqueue>
verb:
Once the caller is joined to the Support queue, the execution of our TwiML document will be stopped and Twilio will loop over the TwiML in waiting.php
, waiting for the call to be dequeued instead.
Only if and when the caller leaves this queue will we continue to execute our TwiML so that the <Say>
block gets run.
We might want to remove callers from the queue at 6 p.m. when our customer support lines close. Lets write some TwiML in waiting.php
with the help of a little PHP. To do this, take a look at the following code:
Here, we check whether the hour on a 24-hour clock is more than 18
(that is, 6 p.m.). If it is, we leave the queue (so that our final bit of TwiML in the previous snippet gets run), or else, we play some hold music and then announce the caller's current position. waiting.php
will simply be requested again and again while a caller queues.
Note
For <Leave>
, we can use a self-closing tag because this verb is never used with a noun. We can write <Leave></Leave>
, which would be equivalent to <Leave />
, but simply writing <Leave />
is quicker.
The <Dial>
verb is probably the most important and, equally, the most complex of all the TwiML verbs.
It allows us to place outbound calls and bridge them to our current one, enabling tonnes of powerful applications, from connecting inbound calls to customer support staff to setting up conferences.
For example, as part of a call, we might dial in to one of our support staff:
Here, we play a message and then call our customer support phone number, recording the call from the time it's answered and asking Twilio to post the recording to recording.php
. When our <Dial>
verb has an action, as is the case here, no TwiML verbs after it will be accessible (that is, used), as Twilio will move on to the action URL.
As always, Twilio has sensible defaults for the <Dial>
verb, which can be customized. For instance, it'll set a timeout of ringing for 30 seconds before it gives up, and callerId
will be set to the number of the current caller. You can discover all of the options in Twilio's documentation at https://www.twilio.com/docs/api/twiml/dial.
When you're using <Dial>
, what you nest within it is very important. You've already seen the use of <Number>
, which will call the number of a physical (that is, PSTN) phone. There are a number of nouns you can nest under it in order to make different kinds of calls:
The <Number>
noun lets you call a traditional phone number; nest one or more these under your <Dial>
verb in order to call it.
One of the most interesting features here is that you can actually try multiple phone numbers. For example, imagine a situation, such as in the following code sample, where you want to try multiple numbers for your customer support phone line:
With this TwiML, Twilio will attempt to call both of the numbers. As soon as someone picks up, it'll stop trying the other.
The <Number>
noun also provides some advanced functionality that lets you control what happens when the dialed party has answered the call, such as the sendDigits
and url
attributes:
- With
sendDigits
, you can ask Twilio to send some DTMF tones to a called party when they pick up (for example, to reach a particular extension behind that number). - With
url
, you can specify the URL or path to a piece of TwiML that can be run against the caller before they're connected to the current call.
Let's go through examples of both of those options.
First, we'll look at url
; we'll start with our TwiML for the actually incoming call, just like we did previously:
A intro.php
file will be played to our customer support agents once they've picked up the call but before the call is actually connected through, letting them reject the call if it's inconvenient:
We'll play a message to the customer support agent and then wait for their input. If they press a digit at the prompt, we'll move on to the digits.php
TwiML file. Otherwise, the call will be rejected and thus hung up, leaving Twilio to keep trying the other number in the <Dial>
block.
Lastly, we'll need to create a digits.php
file to deal with the called party's input:
The agent will be played a quick message, and then Twilio will actually connect the dialed party to the original call.
Tip
You'll notice that we need to do nothing to make this bridging of the two calls happen; it's just that in this context, Twilio's default behavior does this when there is no more TwiML left.
The sendDigits
attribute is useful when we want to dial some digits once the called party picks up. This is useful for automating other phone menus and services, or for dialing an extension, as follows:
Here, we'll dial our number, but when they pick up, we'll wait for 2.5 seconds (each w
character represents a half-second pause) and then dial 100
, our imaginary extension.
Apart from dialing through to physical phones, we can also make calls on Twilio over Session Initiation Protocol (SIP). SIP is a standard, or perhaps the standard for Internet telephony, connecting together a range of phone networks.
The <Sip>
verb effectively acts as a cheaper complement to making calls over PSTN using the <Number>
noun at less than half the cost of calling a US phone number.
We'd dial a SIP URI (which identifies a particular client on a particular SIP server) as follows:
The <Sip>
verb works with all of the various <Dial>
verb options we've seen previously for calls using the <Number>
noun. For example, we can ask to record calls or set a timeout from the <Dial>
verb.
The URL
attribute is also available on the <Sip>
noun and works in exactly the same way as it works for <Number>
, letting us add call screening and other such features with ease.
Tip
We can even combine calls to different kinds of destinations under one <Dial>
verb. For instance, we can simultaneously try to call a member of the staff's mobile phone and the SIP phone on their desk by nesting <Sip>
and <Number>
nouns under a <Dial>
verb.
There are lots of niche options available to you when you're working with SIP in Twilio that aren't worth covering in this book; examples include, forcing the TCP or UDP transport for the connection and sending custom headers with the SIP request.
Often, SIP servers will have authentication on them to prevent unwanted calls. This will usually work in one of two ways: username and password or IP whitelisting.
Username and password protection
As part of our <Sip>
noun, we can specify a username and password that Twilio should provide when sending the INVITE
message to the SIP server. To do this, we simply use the username
and password
attributes on the noun as follows:
Working with IP whitelisting
Perhaps a more common (but harder to deal with) form of authentication is IP whitelisting. This is where you'll set up your SIP server to only accept inbound calls from certain IP ranges.
Fortunately, Twilio provides you with a list of the IP addresses from which the SIP traffic may come. You can find them at the bottom of the page at https://www.twilio.com/docs/sip.
You should revisit this page from time to time, as Twilio expects to add additional IPs to enhance scalability and reliability in future.
Twilio's <Client>
allows you to include voice capabilities within a web page or native app. This means that people can make and take calls from their own devices without using the legacy telephony networks or complicated SIP setups.
This makes it easy to build powerful telephony solutions, for example, browser-to-browser calling within a web application. However, part of the magic is that it's fully connected to the rest of Twilio's platform.
This means that we can set up Twilio Client in a browser and then take incoming calls to it over a traditional phone number. Twilio Client uses TwiML in exactly the same ways as we've already seen for both incoming and outbound calls.
By way of an example, you can imagine a phone conferencing service that uses this to enhance its functionality. Attendees and presenters on a call will be able to join in either through their browser using a headset, from their mobile phone via a custom app, or from any phone of their choice on a local phone number. All of this can be run through Twilio.
Each individual connected to Twilio with Twilio Client will have their own client identifier. It's unique within the scope of our Twilio account and is what we use to connect calls to a particular user. Connecting to a particular client within the context of a <Dial>
verb is very simple indeed:
Tip
If Twilio Client sounds great, you're in luck. Refer to Chapter 3, Calling in the Browser with Twilio Client.
Using Twilio, we can easily build our own custom phone conferencing tool to rival commercial alternatives.
Doing this is a great option as Twilio's APIs give you the power to perform all sorts of integrations and customizations. The <Conference>
verb is really quite a complex noun as it allows you to wield almost all of the features you'd see in professional conferencing tools from your own code.
The <Conference>
noun creates or adds a caller to a conference room of your choice. Simply use the noun and place the name of a room inside it. You don't have to create it ahead of time:
Let's quickly cover the different options—there are a lot of them—and then dive into an example:
Let's build a basic conference where we'll have a passcode for presenters and a passcode for attendees. First, we'll create a TwiML file to handle incoming calls:
This file will play a message and wait for 10
seconds for the caller to enter a six-digit code.
If they fail to enter the code, we'll hang up. In the real world, we'd probably want to do something nicer than this. Otherwise, we'll make a POST
request to digits.php
with the Digits
parameter containing what was entered on the keypad. Let's create the digits.php file
:
Here, we've got a fair bit of logic to go through:
- In the first couple of lines, we set our conference passcodes:
- In the real world, we'd want to connect this to a database that can handle our different conference code, amongst other things.
- If the caller enters the presenter code, which is
123456
, we add them to the conference. By default, the conference will be able to start when they join (as soon as there is another attendee there), but we customize the endConferenceOnExit
option so that the conference finishes the moment they leave. - If the caller enters the attendee code, which is
654321
, we play a message to them and add them to the conference. However, we customize the startConferenceOnEnter
option so that the conference can never start until there is at least one presenter. - If the caller didn't enter one of the recognized code, we say goodbye and then hang up.
From this, it should be evident that Twilio's conferencing is really quite powerful, especially when you use the various customizable options to deal with things such as recordings, moderation, and the waiting experience.
Tip
Twilio conferences only work with a maximum of 40
participants. If you need more callers than this, you'll need to stick with a traditional solution for the time being!
This final noun for <Dial>
allows us to pull a call out of a queue to which we've added a call (that is, caller) with the <Enqueue>
verb.
As with many of the previous nouns, we can specify a url
attribute, pointing to a TwiML that will be played to the queued caller before they're put through to the person being dialed in.
Let's try it for ourselves:
When this TwiML is executed, the caller will be connected to the next caller in the support queue after the person waiting in the queue has gone through the TwiML in alert.php
.
Let's create an example alert.php
file now, where we'll tell the person that their call will be recorded:
The <Hangup>
verb ends a call. It's just used on its own as a self-closing tag with no nouns or attributes. In the next example, we say goodbye and then hang up:
The <Redirect>
verb moves from executing the current TwiML to a different file on a different URL immediately, ignoring the rest of the TwiML in the current file.
Inside the <Redirect>
verb, we provide the absolute URL or relative path of the TwiML file to be executed. We can also set the method
attribute to GET
; it defaults to POST
:
Note
The <Redirect>
verb is not only applicable for phone calls. It is the only verb (except <Message>
verb) that can be used for phone calls and incoming messages.
Nothing can be nested within the the <Redirect>
verb, and any verbs after it are ignored as the redirect takes place right away.
The <Reject>
verb, if placed as the very first verb in an incoming call, will prevent the call from being answered and will incur no cost whatsoever.
If placed elsewhere in the call, the call will hang up but we will still be charged up to that point.
The caller will hear an engaged or busy tone that we can customize through the reason
attribute. We set the reason to rejected
in order to play an engaged tone (which is the default) or set it to busy
for a busy tone.
We can use this if we want to screen out certain types of calls. For example, imagine a situation where we're being spammed by a particular number: