vllm.model_executor.layers.fused_moe.router.base_router ¶
BaseRouter ¶
Bases: FusedMoERouter
Base router class that provides common functionality for all router implementations.
This class implements the template method pattern where select_experts() handles common pre-processing and post-processing, delegating the actual routing logic to the abstract _compute_routing() method.
Source code in vllm/model_executor/layers/fused_moe/router/base_router.py
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
__init__ ¶
__init__(
top_k: int,
global_num_experts: int,
eplb_state: EplbLayerState,
enable_eplb: bool = False,
indices_type_getter: Callable[[], dtype | None]
| None = None,
)
Note: the indices dtype might not be available at router construction time, so we need to supply a callback to get it at runtime. This is because the indices type is supplied by modular kernels which are created after MoE layer/router construction.
Source code in vllm/model_executor/layers/fused_moe/router/base_router.py
_apply_eplb_mapping ¶
Apply EPLB mapping to convert logical expert IDs to physical expert IDs.
Source code in vllm/model_executor/layers/fused_moe/router/base_router.py
_compute_routing abstractmethod ¶
_compute_routing(
hidden_states: Tensor,
router_logits: Tensor,
indices_type: dtype | None,
) -> tuple[Tensor, Tensor]
Compute the actual routing logic.
This method must be implemented by subclasses to provide the specific routing algorithm (e.g., grouped_topk, fused_topk, custom routing, etc.).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_states | Tensor | Input hidden states | required |
router_logits | Tensor | Router logits for expert selection | required |
indices_type | dtype | None | Desired dtype for expert indices (may be None) | required |
Returns:
| Type | Description |
|---|---|
tuple[Tensor, Tensor] | tuple of (topk_weights, topk_ids) |
Source code in vllm/model_executor/layers/fused_moe/router/base_router.py
_convert_indices_dtype ¶
Convert topk_ids to the desired dtype if needed.
Source code in vllm/model_executor/layers/fused_moe/router/base_router.py
_get_indices_type ¶
_get_indices_type() -> dtype | None
Get the desired indices dtype from the getter function.
_validate_eplb_state ¶
Validate that EPLB state is properly initialized if EPLB is enabled.
Source code in vllm/model_executor/layers/fused_moe/router/base_router.py
select_experts ¶
Route the input hidden states to the top-k experts based on the router logits.
This method implements the template method pattern: 1. Validates EPLB state 2. Gets indices type 3. Calls _compute_routing() to get topk_weights and topk_ids 4. Applies EPLB mapping if enabled 5. Converts indices dtype if needed
Returns:
| Type | Description |
|---|---|
Tensor | (topk_weights, topk_ids) |
tuple[Tensor, Tensor] | |
tuple[Tensor, Tensor] | The weights and expert ids computation result. |
tuple[Tensor, Tensor] | Compatibility: When EPLB is not enabled, the returned ids are |
tuple[Tensor, Tensor] | equivalent to global logical ids, so should be compatible with |
tuple[Tensor, Tensor] | plain MoE implementations without redundant experts. |