llava-1.5-7b-hf Beta
Image-to-Text • llava-hfLLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.
Usage
Workers - TypeScript
Parameters
Input
-
0
stringBinary string representing the image contents.
-
1
object-
temperature
numberControls the randomness of the output; higher values produce more random results.
-
prompt
stringThe input text prompt for the model to generate a response.
-
raw
booleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting.
-
image
-
0
arrayAn array of integers that represent the image data constrained to 8-bit unsigned integer values
-
items
numberA value between 0 and 255
-
-
1
stringBinary string representing the image contents.
-
-
max_tokens
integer default 512The maximum number of tokens to generate in the response.
-
Output
-
description
string
API Schemas
The following schemas are based on JSON Schema