Vision API
teleportHQ Vision API is a computer vision API specifically trained for detecting atomic UI elements in pictures of hand-drawn wireframes (as seen in the picture above). It uses an architecture based on Resnet101 for extracting features and Faster R-CNN for bounding-box proposals.
The machine learning model was built and trained using TensorFlow.
List of elements it can distinguish: paragraph, label, header, button, checkbox, radiobutton, rating, toggle, dropdown, listbox, textarea, textinput, datepicker, stepperinput, slider, progressbar, image, video
.
The API is currently in closed alpha, but feel free to contact us if you want early access.
Guideline
We had to decide on some conventions to obtain better results, you can learn more in this blog post.
Using the Vision API
Request
Send all requests to the API endpoint: https://api.vision.teleporthq.io/v2/detection
Request header
Make sure to add a Content-Type
key with the value application/json
and a Teleport-Token
key with the key provided by us.
Request body
The body of the request is a json with two keys: image
and threshold
.
image
is a required string parameter that denotes the direct url to a publicly available jpg or png image.threshold
is an optional parameter. Default value is0.1
. The detection model outputs a confidence score for each detection (between 0 and 1) and won't include in the response detections with confidence lower than this threshold.
Request body example:
{
"image": "https://i.imgur.com/HzTWzLS.jpg",
"threshold": 0.5
}
Request example
curl \
-X POST https://api.vision.teleporthq.io/v2/detection \
-H 'Content-Type: application/json' \
-H 'Teleport-Token: your_token' \
-d '{
"image": "https://i.imgur.com/HzTWzLS.jpg",
"threshold": 0.5
}'
Response
If your request is a valid one, you will recieve back a json with the following structure:
[
{
"box": [y, x, height, width],
"detectionClass": numeric_label,
"detectionString": string_label,
"score": confidence_rating
},
...
]
The json contains a list of objects, each one of this objects corresponding to a detected atomic UI element in the image sent in the request. All of the keys will appear in all of the objects in your response array.
box
contains the coordinates of the bounding box surrounding the detected element.x
andy
are the coordinates of the top left corner of the box andwidth
andheight
are self explanatory. All coordinates are normalized between [0, 1] where(0,0)
is the top left corner of your image and(1, 1)
is the bottom right corner. In other words, if you want to get the pixel coordinates you have to multiplyx
andwidth
with the width of your image andy
andheight
with the height of your image.detectionClass
is the numeric class of the detection.detectionString
is the human-readable label of the detection.score
represents how confident the algorithm is that the predicted object is a correct / valid one. It takes values between[0, 1]
, where1
represents a 100% confidence in its detection.
The detectionClass
to detectionString
mapping is done according to this dictionary:
{
1: 'paragraph',
2: 'dropdown',
3: 'checkbox',
4: 'radiobutton',
5: 'rating',
6: 'toggle',
7: 'textarea',
8: 'datepicker',
9: 'stepperinput',
10: 'slider',
11: 'video',
12: 'label',
13: 'table',
14: 'list',
15: 'header',
16: 'button',
17: 'image',
18: 'linebreak',
19: 'container',
20: 'link',
21: 'textinput'
}
Response example
Full response here.
[
{
"box": [
0.06640399247407913,
0.18573421239852905,
0.0626835897564888,
0.43779563903808594
],
"detectionClass": 15,
"detectionString": "header",
"score": 0.995826005935669
},
{
"box": [
0.16810636222362518,
0.18520960211753845,
0.04797615110874176,
0.17563629150390625
],
"detectionClass": 16,
"detectionString": "button",
"score": 0.9924671053886414
},
{
"box": [
0.8350381255149841,
0.5098391771316528,
0.05998152494430542,
0.23138082027435303
],
"detectionClass": 16,
"detectionString": "button",
"score": 0.9921296238899231
}
]
Previous version
The previous version of the API is still available at this end point:
https://api.vision.teleporthq.io/v1/detection
The detectionClass
to detectionString
mapping for this previous version is done according to this dictionary:
{
1: "paragraph",
2: "label",
3: "header",
4: "button",
5: "checkbox",
6: "radiobutton",
7: "rating",
8: "toggle",
9: "dropdown",
10: "listbox",
11: "textarea",
12: "textinput",
13: "datepicker",
14: "stepperinput",
15: "slider",
16: "progressbar",
17: "image",
18: "video"
}
How do I get a Teleport-Token?
If you are interested in using this API, feel free to get in touch with us via the following form.