vllm.v1.worker.gpu.kv_connector ¶
ActiveKVConnector ¶
Bases: KVConnector
Source code in vllm/v1/worker/gpu/kv_connector.py
__init__ ¶
__init__(
vllm_config: VllmConfig,
kv_caches_dict: dict[str, Tensor],
)
Source code in vllm/v1/worker/gpu/kv_connector.py
no_forward ¶
no_forward(
scheduler_output: SchedulerOutput,
) -> ModelRunnerOutput
Source code in vllm/v1/worker/gpu/kv_connector.py
post_forward ¶
post_forward(
scheduler_output: SchedulerOutput,
wait_for_save: bool = True,
) -> KVConnectorOutput | None
Source code in vllm/v1/worker/gpu/kv_connector.py
pre_forward ¶
pre_forward(scheduler_output: SchedulerOutput) -> None
Source code in vllm/v1/worker/gpu/kv_connector.py
KVConnector ¶
KVConnector interface used by GPUModelRunner.
Source code in vllm/v1/worker/gpu/kv_connector.py
no_forward ¶
no_forward(
scheduler_output: SchedulerOutput,
) -> ModelRunnerOutput
post_forward ¶
post_forward(
scheduler_output: SchedulerOutput,
wait_for_save: bool = True,
) -> KVConnectorOutput | None
pre_forward ¶
pre_forward(scheduler_output: SchedulerOutput) -> None
get_kv_connector ¶
get_kv_connector(
vllm_config: VllmConfig,
kv_caches_dict: dict[str, Tensor],
) -> KVConnector