Introduction
Auvious adheres fully with modern best practices when it comes to providing a robust and developer friendly API over HTTP. In a nutshell the API design follows these basic principles:
- HTTP/REST-like APIs organized per feature/aggregate.
- OAuth2.0 with JWT Bearer Token for endpoint protection.
- MQTT over secure WebSocket (wss) for asynchronous event delivery.
Authentication
Besides only a few exceptions, endpoints are protected by providing a valid JWT Bearer Token in the 'Authorization' HTTP header, like in the following example:
POST /api/test/endpoint HTTP/1.1
Host: auvious.video
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJzZXJ2aWNlIiwiZXhwIjoxNjMzNTE2NDIxLCJ1c2VySWQiOiJzb21lYm9keSIsImF1dGhvcml0aWVzIjpbIlJPTEVfVVNFUiJdLCJqdGkiOiI0NDM2OGY5Ni0yM2Q0LTQ4NzUtYmY3OC0wOGFmNjJmZWI5YTEiLCJjbGllbnRfaWQiOiJjbGllbnQifQ.IAzxFMs6f4xN2COp5IKM-qUVP2Z1iNxzd-IRzsooKxpWTW6HXfRECgdAVCI--A_5IuGvZcNprgFsPl_3JgWSDOF-BUOaCwFN0BxO0Ir8xhsNx_yUY2bogz9_IxI4FWoepSCteLxK0y9krzCN6SrlMsSlerr5mtwOUZ1DJCURIpDGrHzckSRQ_1fY4McyB6N1I6oRB21sqGuflrLLIatZy1sab2hmsNcJQw91_Dp8jWYZ2LgSWSCzMOIgViaAMM9JncomVbfousnkQZ2njrd_2AdwjOBpsmWK2Xi6GUrkC0O1SLC7byuN-0vZtm0e0p0-XJn2p5jwS_KuMMFyYgTEkA
content-type: application/json
content-length: 77
....
This token is digitally signed asymmetrically by our security/identity service upon it's creation, which means that it cannot be tampered without becoming invalid for use with the REST endpoints.
There are different procedures that can be used to acquire a valid token, depending on who is the user that will need to access the service endpoints.
Contact Center Employees
Employees of contact centers usually authenticate using some kind of SSO mechanism with the user organization identity server. The following integrations are currently supported:
- Genesys Cloud: This is an OAuth2.0 Authorization Code Flow based integration, and it's setup automatically when accessed from Genesys Cloud AppFoundry. More technical details are provided here.
- OpenID Connect: This is also based on OAuth2.0 Authorization Code flow, but parameters must be setup manually by Auvious team, by provisioning a suitable application. More technical details are provided here.
Contact Center Applications
These are server side applications, which are used for various integration purposes. These applications need to authenticate and have management access to do tasks, such as:
- create video rooms 
- retrieve various reports 
- ... These applications can authenticate using OAuth2.0 Client Credentials Flow. In order to be able to use this flow, client credentials must first be created, usually through the settings page which are available on users with administrator privileges. Then the service can obtain a valid jwt token by making a POST request to https://auvious.video/security/oauth/token with type - application/x-www-form-urlencodedwith the following parameters:
- client_id 
- client_secret 
- grant_type=client_credentials 
Here's a request example:
POST /security/oauth/token HTTP/1.1
Host: auvious.video
Content-Type: application/x-www-form-urlencoded
Content-Length: 65
grant_type=client_credentials&client_id=test&client_secret=secret
Here's another request example using curl command:
curl -i https://auvious.video/security/oauth/token -H 'content-type: application/x-www-form-urlencoded' -d'grant_type=client_credentials&client_id=test&client_secret=secret'
Customers
Here the requirement is to provide a secure and convenient way for customers to create or join video calls. For this reason Auvious provides a ticket mechanism which can suit several use cases. On the most common scenario, an agent creates a room, and then he clicks on a share room url button, which under the hood creates a ticket which will allow anyone with the link to join the room. For the usual video calling app the url is constructed with the base url of the app plus '/t' and then '/{ticket-id}', e.g. on "https://auvious.video/t/nwi-ssh" the "nwi-ssh" is actually the ticket id.
Basic Video Call(Conference) Flow
Video Call functionality is based on WebRTC, which allows anyone using a modern browser to use apps built with Auvious platform. In our platform each call participant, sends his media to Auvious platform, and more specifically on the SFU (Selective Forwarding Unit) service which is based on Janus OpenSource WebRTC server. In order to use less bandwidth, each participants receives other participant media from the SFU. So in WebRTC terms, which is a peer to peer technology, the one peer is the browser of the participant and the other peer is the SFU.
In order to establish a call the following steps must take place:
- Authentication. At this step the web or mobile application must obtain a valid JWT token, which will grant access to Auvious HTTP APIs.
- Registration. At this step a user endpoint is created, which represents a user session. It's identified by a unique id, the userEndpointId which is used to correlate a user with his device/browser window or tab. At this step, also a connection to the mqtt service must take place, with the jwt token passed as the username. At this step as well, and after a succesful connection to the mqtt service, a subscription to users/endpoints/{userEndpointId}must take place. The last step will essentially enable asynchronous event delivery for any events that the user must receive at the specific 'session'
- Conference Creation. A conference must be created. This is where the users will meet.
- Join Conference. Users need to join the conference.
- Start processing conference events. Now is the time where events related to the conference the user just joined, will start getting delivered. However an event without first knowing the current state of the conference is not of much use. If you get an event that says a user left, what are you going to do if you didn't know the user was joined at the first place. So you need to store any events received at this point, until you have a valid state.
- Fetch conference state. Now you can start processing conference events. Conference events carry a version with them, which can and should be used to detect duplicates. Conference state changing events are always sent using at-least-once (mqtt qos=1) semantics.
- Publish Stream To Conference. After the user's join, they need to publish media to the conference so that other users. Afterwards ice candidates must be send asynchrously using this endpoint one or more times. The client app will need to get available ice servers from this endpoint
- View Other Participant Streams For every stream in the conference the client app will need to establish viewer connections. Ice candidates must also be sent afterwards using this endpoint
Of course there are other operations in order to unpublish or stop viewing streams, leave the conference etc...
As you can guess by reading so far establishing and maintaining a multi party call is not a simple task. So maybe the following diagram can help a little more: