HuggingFacePipeline class. To deploy a model with OpenVINO, you can specify the backend="openvino" parameter to trigger OpenVINO as backend inference framework.
To use, you should have the optimum-intel with OpenVINO Accelerator python package installed.
Model Loading
Models can be loaded by specifying the model parameters using thefrom_model_id method.
If you have an Intel GPU, you can specify model_kwargs={"device": "GPU"} to run inference on it.
optimum-intel pipeline directly
Create Chain
With the model loaded into memory, you can compose it with a prompt to form a chain.skip_prompt=True with LLM.
Inference with local OpenVINO model
It is possible to export your model to the OpenVINO IR format with the CLI, and load the model from local folder.--weight-format:
ov_config as follows:
Streaming
You can usestream method to get a streaming of LLM output,
- OpenVINO LLM guide.
- OpenVINO Documentation.
- OpenVINO Get Started Guide.
- RAG Notebook with LangChain.