Introduction

Mosaix is a cloud-based Semantic-Search-as-a-Service platform for developers to implement natural language understanding (NLU) in their products.

This engine transforms natural language into computational instructions. It leverages public domain data sources and service APIs to assist end users in quickly finishing a task flow by interpreting a user’s utterance and fulfilling a desired action.

By using Mosaix's Deep Semantic Search engine, this API service enables developers to retrieve both the interpretations of user's natural language input and their fulfillments for direct actions on the client side.

How It Works

Mosaix can be integrated into mobile applications, cloud-based software, bots, and speech-enabled IoT devices like smart TVs, speakers, headphones, connected cars, and more. Our Text-to-Interpretation API, powered by our state-of-the-art nature language understanding technology, can be used to understand a user’s voice and text queries, and turn them into actionable insights (in the forms of Domain, Intent and Entities of the query).

Concepts

Utterance (Query): User command from voice recognition or typing. Domain: A Domain is a top-level concept that represents a user’s broader intent. It is a group of related Intents that clearly signals a user’s general Intent and what an application should do. Mosaix currently supports 29 domains. Please contact the Mosaix team for the full list of domains.

Intent: An Intent represents a mapping between what a user says and the desired actions that should be fulfilled by an application. Mosaix currently supports 149 Intents. Please contact the Mosaix team for the full list of Intents.

Entity: In our NLU engine, an Entity is the term given for the useful information in a user’s input for answering their query or determining specific values that are needed to perform an action. The Entities allow the NLU engine to recommend performing specific actions within a particular Intent.

Interpretations: Each Interpretation represents how the NLU engine understands a user's query and recommends a corresponding fulfillment.

Language

Mosaix is a multilingual platform. Our Text-to-Interpretation and Text-to-Fulfillment services currently support the following languages:

              EN:  English
              ES:  Spanish
              AR:  Arabic
              HI:  Hindi
              BN:  Bengali
              VI:  Vietnamese
      
Support for other languages is under active development. If you need support for a new language, feel free to reach out to the Mosaix team directly.

Getting Started

In order to use our API endpoints, you will have to apply for an account and obtain a token, client ID and secret key.

Currently, Mosaix’s API is still in closed beta, and only supports email applications. Please send an email to dev@mosaix.ai with your company name and product information (product description, type, distributed location, etc.). Based on your request, Mosaix will reply with all necessary information for testing our APIs.

API Reference

As soon as you receive your token, client ID and secret key, you are ready to use Mosaix APIs. The following documentation will help you learn about our public API offerings, including their features, instructions for use, and descriptions of their response. Currently Mosaix offers 2 different services:

  • Text to Interpretation (TTI) : https://api.mosaix.ai/v2/interpretation
  • Text to Fulfillment (TTF) : https://api.mosaix.ai/v2/texttofulfillment
The request and response for each one will be explained and demoed in the following ections

API requests from each token are limited according to your application usage requirements; details on these limitations are available in the email you will receive upon sign-up. If your assigned limit is not suitable for your needs, you may request a higher limit by emailing dev@mosaix.ai.

Text to Interpretation (TTI)

Text to Interpretation (TTI) here means sending a request to the Mosaix TTI API with a valid utterance and receiving a response of interpretations from the Mosaix NLU server.

The response details how the NLU engine understands a user’s query. An Interpretation will include the inferred Domain and Intent of a query, in addition to any entities that appear in the query. These semantic understandings of the query help machines understand what a user is looking for, and how to fulfill her need.

Request

The request is based on REST principles, where data sources are accessed via standard https request in UTF-8 format to an API endpoint. As for now, the API supports POST request only.

The Standard API is,

https://api.mosaix.ai/v2/texttointerpretation

Please check the example POST request.

              curl -X POST 
              https://api.mosaix.ai/v2/texttointerpretation 
                      -H 'Content-Type: application/json' 
                      -H 'Authorization: Bearer YOUR_API_TOKEN' 
                      -H 'Device-Id: YOUR_DEVICE_ID' 
                      -H 'Language-Code: YOUR_LANGUAGE_CODE' 
                      -H 'Region-Code: YOUR_REGION_CODE' 
                      -d '{
                              "query":"yellow by coldplay"
              }'
              
You will have to add valid header in each request. Please check the description below:
FieldDescription
YOUR_API_TOKENApplied and received from Mosaix Team. Required in each query
YOUR_DEVICE_IDCustomer-defined ID. Used to track which device sent the request
YOUR_LANGUAGE_CODEIn ISO_639-1. Please refer to Mosaix supported languages and select one from there
YOUR_REGION_CODEIn ISO_3166-1_ALPHA-2. Mosaix does not limit such code as a requirement. However, the fulfilled result may be affected

Response

After getting a valid request, the response will be provided in JSON.

Please check the sample code below:

              {
                      "encoding": "utf-8",
                      "id": "d08567d9-27b7-42f6-b47e-9293d46502b8",
                      "language": "en",
                      "session": {},
                      "shouldEndSession": true,
                      "status": "Complete",
                      "timestamp": "2020-02-25T12:08:47.725Z",
                      "utterance": "play Yellow by Coldplay",
                      "version": "2.12.2",
                      "interpretations": [
                              {
                                      "domain": "Music",
                                      "intent": "Music.Play",
                                      "entities": [...],
                                      "resolution": [...],
                                      "score": 2.3955576419830324
                              }
                      ]
              }
              
Following is a table explaining top level fields.
FieldTypeDescription
versionStringAPI version number
utteranceStringUser’s original query
sessionStringContext info for multi-round conversation
statusStringThis field can be used to provide extra information to the client. For example, if the external API encounters a timeout while the fulfillment request is partially returned, this field will indicate it as returning partial. This field can also be used to indicate that the fulfillment failed.
Enum values for status fields are as below:
-   complete
(default value for single call)
-   partial
(for dialog mode only)
-   failed
Error occurred
encodingStringEncoding method:
-   UTF-8
-   UTF-16
languageStringDetected language: 2 letter language code, defaults to en
timestampString UTC timestamp of the response generated as ISO string format: ‘YYYY-MM-DDTHH:mm:ss.sssZ’
idStringQuery ID for tracking purposes
interpretationsArrayRefers to domain, intent, entity, and denotations parsed from utterance. Will explain in the following table
Each interpretation in the interpretations array represents how the NLU engine understands and parses user’s query. The following describe each field of the interpretation object.
FieldTypeDescription
domainStringDomain info for the client to take general actions
intentStringIntent info for the client to route to specific services
entitiesArrayEntity objects recognized by the NLU engine
resolutionArray An array of entity resolution object which is based on the entity recognition module and value retrieved from Mosaix KG. See the following Interpretation.resolution for details.
scoreNumber Confidence score of this semantic interpretation from NLU (optional)

Entities

An Entity defines how the NLU engine interprets a user’s query. In short, an Entity is the subject of a query used to fulfill a user’s request.

Please check the sample code below for an example of the format of an Entity:

      interpretations": [
              {
                      "domain": "Music",
                      "intent": "Music.Play",
                      "entities": [
                               {
                                        "name": "song",
                                        "resolution": {
                                                 "value": "yellow",
                                                 "denotation": "",
                                                 "metadata": ""
                                        },
                                        "value": "Yellow",
                                        "type": "MusicSong",
                                        "confidence": 0
                               },    
                               {    
                                        "name": "artist",
                                        "resolution": {
                                                 "value": "coldplay",
                                                 "denotation": "SPOTIFY:4gzpq5DPGxSnKTe4SA8HAU,BOOM:2067406",
                                                 "metadata": ""
                                        },
                                        "value": "Coldplay",
                                        "type": "MusicArtist",
                                        "confidence": 0
                               }
                      ],    
              }
      ]
      
FieldTypeDescriptionSub-FieldTypeDescription
nameStringThe name of the entity
typeStringThe type of the entity
valueStringThe surface text of the recognized entity
confidenceNumber The confidence score of this entity. The score will be normalized between 0~10.
resolutionObject The resolution of this entity by going through NLU’s entity resolution models, no matter it is from KG, or duckinling, or pattern based model valueStringThe value of this resolved entity, it will be used by API for fulfill logic
denotationStringThis entity’s denotation, it consists of platform name and source id
metadataStringDetailed metadata of this resolved entity. It could host value resolved by other models like duckling

Resolution

The interpretation level resolution represents the inferred results to this semantic interpretation by retrieving resolved entities and performing logic computing on them with Mosaix’s knowledge graph. Please check the sample code below:

      "interpretations": [
              {
                      "domain": "Music",
                      "intent": "Music.Play",
                      "entities": [...
                      ],
                      "resolution": [
                              {
                                      "value": "The Scientist",
                                      "denotation": "SPOTIFY:75JFxkI2RXiU7L9VXzMkle",
                                      "metadata": ""
                              },    
                              {    
                                      "value": "Something Just Like This",
                                      "denotation": "SPOTIFY:6RUKPb4LETWmmr3iAEQktW",
                                      "metadata": ""
                              },    
                              {    
                                      "value": "Yellow",
                                      "denotation": "SPOTIFY:3AJwUDP919kvQ9QcozQPxg",
                                      "metadata": ""
                              },    
                              {    
                                      "value": "A Rush of Blood to the Head",
                                      "denotation": "SPOTIFY:0RHX9XECH8IVI3LNgWDpmQ",
                                      "metadata": ""
                              },
                                  ...
                      ],
                      "score": 2.3955576419830322
              }    
      ]
      
FieldTypeDescription
valueStringThe string value of the resolved entity, will be directly used by API or client for action or fulfillment
denotationString The source and id of the entity which has been successfully retrieved from Mosaix KG. This can be used for retrieve entertainment domain contents in fulfillment stage.
metaDataStringMetadata of this resolved entity can be an object response from the entity resolution framework, or structured data from Mosaix’s KG

Text to Fulfillment (TTF)

On the top of TTI, Text-to-Fulfillment (TTF) is a more completed solution. The response will not only cover the interpretation but also the fulfillment of search reasult. The interpretation parts is the same with TTI's response.

A Fulfillment uses the interpretations to query Mosaix’s API, which will search Mosaix’s internal knowledge graph as well as external APIs (e.g., Spotify API). As a result, a fulfillment will provide relevant content. For example, given a query "Play Taylor Swift" a fulfilment would include a “deeplink” to play the song on the device’s music player.

For each query, Mosaix NLU may generate multiple interpretations and fulfillments if the query is ambiguous. If this is the case, the API will return interpretations in a ranked format according to a confidence score.

Request

The request for TTF has a different url than TTI. But all the rest part, including header, is the same with TTI. It is based on REST principles and supports POST request only.

The Standard API is,

https://api.mosaix.ai/v2/texttofulfillment

Please check the example POST request.

Following required headers are the same with TTI's.

              curl -X POST 
              https://api.mosaix.ai/v2/texttofulfillment 
                      -H 'Content-Type: application/json' 
                      -H 'Authorization: Bearer YOUR_API_TOKEN' 
                      -H 'Device-Id: YOUR_DEVICE_ID' 
                      -H 'Language-Code: YOUR_LANGUAGE_CODE' 
                      -H 'Region-Code: YOUR_REGION_CODE' 
                      -d '{
                              "query":"yellow by coldplay"
              }'
              
You will have to add valid header in each request. Please check the description below:
FieldDescription
YOUR_API_TOKENApplied and received from Mosaix Team. Required in each query
YOUR_DEVICE_IDCustomer-defined ID. Used to track which device sent the request
YOUR_LANGUAGE_CODEIn ISO_639-1. Please refer to Mosaix supported languages and select one from there
YOUR_REGION_CODEIn ISO_3166-1_ALPHA-2. Mosaix does not limit such code as a requirement. However, the fulfilled result may be affected

Response

After getting a valid request, the response will be provided in JSON. It includes data such as metadata, displayResponse and speechResponse.

Please check the sample code below:

              {
                      "encoding": "utf-8",
                      "id": "8f4b96b4-36dc-430d-8a27-379fa4dd8cac",
                      "language": "en",
                      "session": {},
                      "shouldEndSession": true,
                      "status": "complete",
                      "timestamp": "2019-11-06T10:37:27.254Z",
                      "utterance": "Play Yellow by Coldplay",
                      "version": "2.11.2",
                      "interpretations": [
                              {
                                      "domain": "Music",
                                      "intent": "Music.Play",
                                      "entities": [
                                              {
                                                      "name": "song",
                                                      "resolution": {
                                                              "value": "yellow",
                                                              "denotation": "",
                                                              "metadata": ""
                                                      },
                                                      "value": "yellow",
                                                      "type": "MusicSong",
                                                      "confidence": 0
                                              },
                                              {
                                                      "name": "artist",
                                                      "resolution": {
                                                              "value": "coldplay",
                                                              "denotation": "",
                                                              "metadata": ""
                                                      },
                                                      "value": "coldplay",
                                                      "type": "MusicArtist",
                                                      "confidence": 0
                                              }
                                      ],
                                      "displayResponse": {
                                              "type": "text",
                                              "title": "OK, start playing for you",
                                              "text": "OK, start playing for you",
                                              "images": {}
                                      },
                                      "speechResponse": {
                                              "type": "text",
                                              "value": "Got it",
                                              "ssml": "<speak>Got it</speak>",
                                              "playMode": "QUENE_ADD",
                                              "audioContent": []
                                      },
                                      "reprompt": {},
                                      "hasFulfillment": true,
                                      "fulfillment": [
                                              {
                                                      FULFILLMENT_RESULT ......
                                              }
                                      ]
                                      "status": "success",
                                      "score": 1.4447158575057983,
                                      "debug": {},
                                      "hints": [],
                                      "clientSearchUrl": ""
                              },
                      ]
              }
              
The basic schema is almost the same for different domains. It will have standard API response, interpretations and fulfillment results. The difference between each domain will be found in fulfillment.

The following table will explain each field in response. You may also refer to the Appendix for example.
FieldTypeDescription
versionStringAPI version number
utteranceStringUser’s original query
sessionStringContext info for multi-round conversation
statusStringThis field can be used to provide extra information to the client. For example, if the external API encounters a timeout while the fulfillment request is partially returned, this field will indicate it as returning partial. This field can also be used to indicate that the fulfillment failed.
Enum values for status fields are as below:
-   complete
(default value for single call)
-   partial
(for dialog mode only)
-   failed
Error occurred
encodingStringEncoding method:
-   UTF-8
-   UTF-16
languageStringDetected language: 2 letter language code, defaults to en
timestampString UTC timestamp of the response generated as ISO string format: ‘YYYY-MM-DDTHH:mm:ss.sssZ’
idStringQuery ID for tracking purposes
interpretationsArrayRefers to domain, intent, entity, and denotations parsed from utterance. Will explain in the following table
In addition, the field type for Interpretations is Array. It includes the following field:
FieldTypeDescription
domainStringDomain info for the client to take general actions
intentStringIntent info for the client to route to specific services
entitiesArrayEntity objects recognized by the NLU engine
displayResponsetObject Provides text or card feedback to be displayed within a GUI
speechResponseObject Used for playing the generated speech on the client side
hasFulfillmentObjectProvide guidance or hints on client side to get all required slots filled in to finish the task flow
repromptBooleanIndicates whether there are fulfilment data for this interpretation (optional)
fulfillmentArrayFulfillment objects that host all final search results according to the given interpretation
statusStringIndicates the status of the fulfillment:
-   success
-   fallback
-   partial
-   failed
hintsArrayReminds the user of available domain and intents
clientSearchUrlstringURL to open the YouTube app/YouTube web and display the search results for returned entities
scoreNumberConfidence score of this semantic interpretation from NLU (optional)
debugObjectExtra debug info for client to use(optional)

Entities and Resolution

The entity and resolution here are the same as they are in TTI response. Please refer to the previous section for reference.

displayResponse

A displayResponse define the displayed information for frontend UI. It supports two types which are text and cards.

Please check the sample code.

      "displayResponse": {
             "type": "text",
             "title": "OK, start playing for you",
             "text": "OK, start playing for you",
             "images": {}
      },
      
FieldTypeDescription Sub-FieldTypeDescription
typeStringThe type of display response, could be either text or card
titleStringThe title to be displayed in the card. (Only applicable for card type)
textString The generated text to be rendered on the front-end UI. (only applicable for text type)
imagesArray An array of images with different resolutions. The images will be used as a background or in the image section of the card. (Only applicable for card type) widthNumberThe value of the image width
heightNumberThe value of the image length
urlStringLink to the image

speechResponse

A speechResponse defines the playing information and playing mode. It will guide how frontend will provide voice feedback, including how to call a third party TTS service.

Please check the sample code.

      "speechResponse":{
              "type": "text",
              "value": "Got it",
              "ssml": "<speak>Got it</speak>",
              "playMode": "QUENE_ADD",
              "audioContent": []
      }
      
FieldTypeDescription
typeStringThe type of speech response, Either use plain text for TTS, or use the SSML for playback
valueStringRaw text value to be played through TTS (Only applicable for text type)
ssmlStringText to be played through the TTS service which supports SSML (only applicable for SSML type)
playModeStringDefines how to play the audio on the client. Either add new text to existing queue, or flush to delete the existing text to queue the new one
audioContenArrayThe pregenerated audio to be played on the client side. It can be MP3 format, or with decoding method  , depending on the cloud TTS services vendor

Fulfillment

Fulfillment is the result of how Mosaix‘s backend proceed input utterance. It will follow client’s configuration and retrieve from internal searching engine and external API. Eventually it merges all the response and structured valid data for clients.

We will cover a few typical domains in the next a few sessions.

Please check the sample code.

      {
              "source": "spotify",
              "type": "track",
              "results": []
      }
      
FieldTypeDescription
sourceString Indicates the source of the content provider
typeString Indicates the type of search results for music, movies, and TV domains
resultsArray Objects providing the value of the search results from external or internal API

Confidence Score

For Domains pertaining to the “entertainment” category, such as Music, Video, TV, and Movie, a confidence score field is returned. The range of this score is between 0 to 1. This score represents the model’s confidence that the returned fulfillment is the correct one given other choices considered by the model.

In addition, a displayResponse and speechResponse will refer to this score.

As a reference, the threshold of ConfidenceScore is descripted in the following table.

ConfidenceScoreMatch mode
0Not match
0 ~ 0.9Partial match
0.9 ~ 1Perfect match
This table is provided for rough reference purposes only. Practical thresholds for Mosaix responses should be determined according to observation. Mosaix will provide the score without any further recommendation for action for a particular use case.

Please check the sample code.
      "searchURL": [
              {
                       "baseURL": "spotifyApi.search()",
                       "body": {
                                "artist": "coldplay",
                                "countryCode": "US",
                                "spotifyCombination": "ArtistAndTrack",
                                "track": "yellow",
                                "type": "track"
                       },
                       "clickableURL": ""
              }
      ],  
      "confidenceScore": 1,
      "rankingScores": {
              "creditsJaccardDistance": 0,
              "isSequel": false,
              "jackDistance": 0.48333333333333334,
              "stringSimilarity": 1,
              "titleJaccardDistance": 0
      },
      

Music Fulfillment

In the music Domain, the result field is returned with the following fields:

FieldTypeDescription
deeplinkStringInformation requrired to redirect to source App
id (from content source)StringDenotes an index to an Entity. It could represent an artist, song, album, playlist, etc
images (source poster)ArrayImage links for displaying search results. The first result is the default. The client can choose an alternative from this array
nameStringName to display
artistsArrayArtist information to display
albumObjectAlbum information to display
popularityNumberThe popularity for search result. The result is from the default search result. (optional)
typeStringDefines the returned Entity type as a song, artist, album playlist, genre, etc. (optional)
previewUrlStringDeeplink for preview
durationMsNumberDuration of a returned song Entity
weblinkStringWeblink to result for streaming or downloading
ownerIdStringPlaylist owner ID (optional)
feedbackLinkStringRecords whether the customer accessed the result for data collection purposes (optional)
Response returned for a query identified to be within the Music Domain:
      {
              "type": "track",
              "source": "spotify",
              "tags": [
                      {
                              "name": "artist",
                              "value": "coldplay"
                      }
              ],
              "results": [
                      {
                               "artists": [
                                        {
                                                 "external_urls": {
                                                      "spotify": "https://open.spotify.com/artist/4gzpq5DPGxSnKTe4SA8HAU"
                                                 },
                                                 "href": "https://api.spotify.com/v1/artists/4gzpq5DPGxSnKTe4SA8HAU",
                                                 "id": "4gzpq5DPGxSnKTe4SA8HAU",
                                                 "name": "Coldplay",
                                                 "type": "artist",
                                                 "uri": "spotify:artist:4gzpq5DPGxSnKTe4SA8HAU"
                                        }
                               ],
                               "album": {
                                        "name": "Parachutes"
                               },
                               "deeplink": "spotify:track:3AJwUDP919kvQ9QcozQPxg",
                               "durationMs": 266773,
                               "id": "3AJwUDP919kvQ9QcozQPxg",
                               "images": [
                                        {
                                             "height": 640,
                                             "url": "IMAGE_URL",
                                             "width": 640
                                        },
                                        {
                                             "height": 300,
                                             "url": "IMAGE_URL",
                                             "width": 300
                                        },
                                        {
                                             "height": 64,
                                             "url": "IMAGE_URL",
                                             "width": 64
                                        }
                               ],
                               "name": "Yellow",
                               "previewUrl": "PREVIEW_URL",
                               "type": "track",
                               "weblink": "https://open.spotify.com/track/3AJwUDP919kvQ9QcozQPxg",
                               OTHER_INFORMATION...
                      },
              ]
      }
      

Video Fulfillment

The schema of video response is very similar to that of music. The only difference will be in the fulfillment under interpretation. Video fulfillment contains 3 domains named tv, movie and video. The response schema is the same among all three. The definition for the result field is as follows:

FieldTypeDescription
deeplinkStringTo redirect to source App
id (from content source)StringDenotes an index to an Entity. It could represent an artist, song, album, playlist, etc
images (source poster)ArrayImage links for displaying search results. The first result is the default. The client can choose an alternative from this array
nameStringName to display
channelTitleStringChannel information to display(optional)
descriptionStringDescription to display
typeStringContent type(optional)
weblinkStringWeblink to result for streaming or downloading
feedbackLinkStringRecords whether the customer accessed the result for data collection purposes (optional)
Response returned for a query identified to be within the Video Domain:
      {
              "type":"artist",
              "source":"youtube",
              "results":[
                      {
                              "channelTitle": "Cardi B",
                              "description": "Cardi B & Bruno Mars - Please Me (Official Video) Stream/Download: 
                              https://cardib.lnk.to/PleaseMeID Directed by Bruno Mars and Florent Dechard ...",
                              "id": "3y-O-4IL-PU",
                              "deeplink": "https://www.youtube.com/embed/3y-O-4IL-PU",
                              "image": [
                                        {
                                                "url": "IMAGE_URL",
                                                "width": 120,
                                                "height": 90
                                        },    
                                        {    
                                                "url": "IMAGE_URL",
                                                "width": 320,
                                                "height": 180
                                        },    
                                        {    
                                                "url": "IMAGE_URL",
                                                "width": 480,
                                                "height": 360
                                        }
                              ],
                              "name": "Cardi B & Bruno Mars - Please Me (Official Video)",
                              "type": "video",
                              "weblink": "https://www.youtube.com/embed/3y-O-4IL-PU",
                              "feedbackLink": "FEEDBACK_LINK"
                      }    
              ]    
      }        
      

Generic Fulfillment

The generic Domain category covers usual media control, dialog management, and some general device control commands.

The schema of generic domain is the same as those above, but with the value for fulfillment under each interpretation returned as “false” and with the fulfillment empty. While “hasFulfillment” is false, the front-end should make a logical decision based on the data from the returned Domain, Intent, and Entities.

Please check the sample code.

      {
              "utterance": "play next music",
              "version": "2.10.0",
              "interpretations": [
                      {
                              "domain": "Generic",
                              "intent": "Generic.Next",
                              "entities": [],
                              "displayResponse": {
                                      "type": "text",
                                      "title": "Playing the next",
                                      "text": "Playing the next",
                                      "images": {}
                              },
                              "speechResponse": {
                                      "type": "text",
                                      "value": "",
                                      "ssml": "<speak></speak>",
                                      "playMode": "QUENE_ADD",
                                      "audioContent": []
                              },
                              "reprompt": {},
                              "hasFulfillment": false,
                              "fulfillment": [],
                              "status": "success",
                              "score": 10,
                              "debug": {},
                              "hints": [],
                              "clientSearchUrl": ""
                      }
              ]    
      }
        

Customized Response

Mosaix supports customized responses for different clients. Please contact Mosaix at dev@mosaix.ai if you have a need for customized responses.

Help

If you still have other questions, please contact dev@mosaix.ai