batch_size = 1 sequence_length: During prompt processing: sequence_length = num_input_tokens which can be 512 or even 1024 During decoding/generation sequence_length = 1 Operation: /model/embed_tokens/Gather (Gather) Inputs: [] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/Shape (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/Shape_2 (Shape) Inputs: [] Outputs: [(2,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Shape_3 (Shape) Inputs: [] Outputs: [(4,)] -------------------------------------------------- Operation: Transpose_6764 (Transpose) Inputs: [] Outputs: [(896, 151936)] -------------------------------------------------- Operation: /model/Unsqueeze_6 (Unsqueeze) Inputs: [] Outputs: [('batch_size', 1, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Gather (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/embed_tokens/Gather_output_0_DequantizeLinear (DequantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/Shape_1 (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/Gather_2 (Gather) Inputs: [(2,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_3 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 1, 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 1, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/Gather_3 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/Cast_3 (Cast) Inputs: [('batch_size', 1, 1, 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 1, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Range_1 (Range) Inputs: [()] Outputs: [('past_sequence_length + 1',)] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/Unsqueeze_4 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(2,)] -------------------------------------------------- Operation: /model/Concat_1 (Concat) Inputs: [(1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/Range (Range) Inputs: [(), ()] Outputs: [('sequence_length',)] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/ConstantOfShape (ConstantOfShape) Inputs: [(2,)] Outputs: [('sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Reshape (Reshape) Inputs: [('sequence_length',)] Outputs: [('sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/Equal (Equal) Inputs: [(4,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/Trilu (Trilu) Inputs: [('sequence_length', 'past_sequence_length + 1')] Outputs: [('sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Greater (Greater) Inputs: [('past_sequence_length + 1',), ('sequence_length', 1)] Outputs: [('sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/Where (Where) Inputs: [(4,), (4,)] Outputs: [(4,)] -------------------------------------------------- Operation: Cast_2427 (Cast) Inputs: [('sequence_length', 'past_sequence_length + 1')] Outputs: [('sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/Mul (Mul) Inputs: [('sequence_length', 'past_sequence_length + 1'), ('sequence_length', 'past_sequence_length + 1')] Outputs: [('sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.0/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.0/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.0/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.0/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.0/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.0/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.0/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/Unsqueeze_2 (Unsqueeze) Inputs: [('sequence_length', 'past_sequence_length + 1')] Outputs: [(1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.0/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/Unsqueeze_3 (Unsqueeze) Inputs: [(1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [(1, 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/Expand (Expand) Inputs: [(1, 1, 'sequence_length', 'past_sequence_length + 1'), (4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/Slice (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: Shape_2482 (Shape) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/Add_1 (Add) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 1, 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Shape_5 (Shape) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [(4,)] -------------------------------------------------- Operation: Gather_2484 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: Gather_2498 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: Gather_2505 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/Equal_1 (Equal) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: Range_2488 (Range) Inputs: [()] Outputs: [('batch_size',)] -------------------------------------------------- Operation: Range_2502 (Range) Inputs: [()] Outputs: [('sequence_length',)] -------------------------------------------------- Operation: Range_2509 (Range) Inputs: [()] Outputs: [('past_sequence_length + 1',)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/Where_1 (Where) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Slice_3 (Slice) Inputs: [('past_sequence_length + 1',), (1,)] Outputs: [('past_sequence_length + 1',)] -------------------------------------------------- Operation: Reshape_2517 (Reshape) Inputs: [('batch_size',)] Outputs: [('batch_size', 1, 1, 1)] -------------------------------------------------- Operation: Reshape_2521 (Reshape) Inputs: [('sequence_length',)] Outputs: [('sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Shape_10 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/Expand_1 (Expand) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Add_2 (Add) Inputs: [('batch_size', 1, 1, 1)] Outputs: [('batch_size', 1, 1, 1)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_11 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_13 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/Add_3 (Add) Inputs: [('batch_size', 1, 1, 1), ('sequence_length', 1)] Outputs: [('batch_size', 1, 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/Add_4 (Add) Inputs: [('batch_size', 1, 'sequence_length', 1), ('past_sequence_length + 1',)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/Shape_6 (Shape) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_3 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/Equal_2 (Equal) Inputs: [(4,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/Reshape_2 (Reshape) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Where_2 (Where) Inputs: [(4,), (4,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/Expand_2 (Expand) Inputs: [('batch_size', 1, 1, 1), (4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Expand_3 (Expand) Inputs: [(4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Expand_4 (Expand) Inputs: [('sequence_length', 1), (4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/Expand_5 (Expand) Inputs: [('past_sequence_length + 1',), (4,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/Unsqueeze_11 (Unsqueeze) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1)] -------------------------------------------------- Operation: /model/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1)] -------------------------------------------------- Operation: /model/Unsqueeze_13 (Unsqueeze) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1)] -------------------------------------------------- Operation: /model/Unsqueeze_14 (Unsqueeze) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/Concat_2 (Concat) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 1)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 4)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Shape_5 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/ScatterND (ScatterND) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1', 4), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_7 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_9 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Shape_15 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Gather_15 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.0/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.0/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.0/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.0/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.0/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.0/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.0/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.0/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.1/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.1/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.1/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.1/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.1/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.1/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.1/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.1/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.1/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.1/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.1/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.1/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.1/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.1/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.1/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.2/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.2/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.2/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.2/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.2/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.2/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.2/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.2/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.2/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.2/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.2/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.2/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.2/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.2/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.2/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.3/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.3/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.3/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.3/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.3/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.3/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.3/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.3/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.3/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.3/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.3/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.3/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.3/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.3/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.3/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.4/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.4/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.4/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.4/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.4/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.4/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.4/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.4/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.4/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.4/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.4/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.4/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.4/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.4/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.4/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.5/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.5/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.5/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.5/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.5/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.5/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.5/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.5/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.5/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.5/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.5/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.5/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.5/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.5/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.5/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.6/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.6/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.6/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.6/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.6/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.6/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.6/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.6/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.6/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.6/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.6/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.6/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.6/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.6/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.6/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.7/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.7/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.7/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.7/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.7/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.7/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.7/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.7/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.7/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.7/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.7/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.7/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.7/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.7/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.7/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.8/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.8/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.8/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.8/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.8/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.8/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.8/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.8/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.8/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.8/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.8/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.8/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.8/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.8/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.8/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.9/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.9/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.9/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.9/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.9/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.9/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.9/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.9/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.9/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.9/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.9/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.9/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.9/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.9/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.9/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.10/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.10/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.10/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.10/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.10/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.10/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.10/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.10/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.10/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.10/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.10/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.10/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.10/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.10/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.10/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.11/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.11/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.11/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.11/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.11/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.11/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.11/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.11/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.11/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.11/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.11/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.11/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.11/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.11/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.11/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.12/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.12/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.12/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.12/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.12/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.12/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.12/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.12/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.12/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.12/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.12/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.12/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.12/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.12/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.12/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.13/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.13/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.13/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.13/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.13/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.13/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.13/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.13/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.13/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.13/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.13/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.13/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.13/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.13/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.13/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.14/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.14/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.14/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.14/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.14/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.14/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.14/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.14/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.14/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.14/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.14/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.14/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.14/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.14/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.14/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.15/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.15/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.15/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.15/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.15/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.15/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.15/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.15/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.15/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.15/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.15/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.15/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.15/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.15/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.15/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.16/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.16/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.16/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.16/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.16/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.16/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.16/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.16/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.16/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.16/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.16/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.16/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.16/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.16/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.16/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.17/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.17/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.17/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.17/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.17/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.17/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.17/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.17/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.17/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.17/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.17/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.17/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.17/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.17/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.17/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.18/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.18/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.18/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.18/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.18/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.18/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.18/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.18/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.18/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.18/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.18/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.18/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.18/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.18/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.18/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.19/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.19/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.19/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.19/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.19/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.19/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.19/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.19/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.19/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.19/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.19/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.19/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.19/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.19/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.19/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.20/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.20/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.20/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.20/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.20/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.20/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.20/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.20/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.20/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.20/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.20/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.20/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.20/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.20/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.20/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.21/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.21/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.21/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.21/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.21/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.21/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.21/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.21/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.21/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.21/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.21/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.21/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.21/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.21/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.21/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.22/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.22/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.22/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.22/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.22/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.22/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.22/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.22/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.22/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.22/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.22/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.22/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.22/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.22/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.22/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Shape (Shape) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.23/input_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.23/self_attn/q_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/q_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/q_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/q_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/k_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/k_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/k_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/k_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/v_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/v_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/v_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/v_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 128), ()] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_1 (Gather) Inputs: [(3,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/q_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/k_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/v_proj/Add (Add) Inputs: [('batch_size', 'sequence_length', 128)] Outputs: [('batch_size', 'sequence_length', 128)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_1 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_1 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_11 (Concat) Inputs: [(1,), (1,)] Outputs: [(3,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape (Reshape) Inputs: [('batch_size', 'sequence_length', 896), (4,)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape_1 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape_2 (Reshape) Inputs: [('batch_size', 'sequence_length', 128), (4,)] Outputs: [('batch_size', 'sequence_length', 2, 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Transpose (Transpose) Inputs: [('batch_size', 'sequence_length', 14, 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Transpose_1 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Transpose_2 (Transpose) Inputs: [('batch_size', 'sequence_length', 2, 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Shape_2 (Shape) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_6 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Slice (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Slice_1 (Slice) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Slice_2 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Slice_3 (Slice) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_2 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Shape_11 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_21 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Neg (Neg) Inputs: [('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Neg_1 (Neg) Inputs: [('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 32)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Add (Add) Inputs: [(), ()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_12 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_14 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_3 (Concat) Inputs: [('batch_size', 14, 'sequence_length', 32), ('batch_size', 14, 'sequence_length', 32)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_4 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 32), ('batch_size', 2, 'sequence_length', 32)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/rotary_emb/Unsqueeze (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_22 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_24 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/rotary_emb/Slice (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/rotary_emb/Slice_1 (Slice) Inputs: [(1,)] Outputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_9 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_10 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_4 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_5 (Gather) Inputs: [('Min(32768, past_sequence_length + sequence_length)', 64)] Outputs: [('batch_size', 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Equal_1 (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_6 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_7 (Unsqueeze) Inputs: [('batch_size', 'sequence_length', 64)] Outputs: [('batch_size', 1, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Where_1 (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Mul (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Mul_2 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Mul_1 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Mul_3 (Mul) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 1, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Expand_1 (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Add_1 (Add) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Add_2 (Add) Inputs: [('batch_size', 2, 'sequence_length', 64), ('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape_6 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_5 (Concat) Inputs: [('batch_size', 2, 'sequence_length', 64)] Outputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Mul_8 (Mul) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Shape_6 (Shape) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_12 (Unsqueeze) Inputs: [('batch_size', 2, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_8 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_10 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_13 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_15 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_7 (Concat) Inputs: [(1,), (1,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Concat_8 (Concat) Inputs: [(1,), (1,)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Equal (Equal) Inputs: [(5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Where (Where) Inputs: [(5,), (5,)] Outputs: [(5,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Expand (Expand) Inputs: [('batch_size', 2, 1, 'past_sequence_length + 1', 64), (5,)] Outputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape_4 (Reshape) Inputs: [('batch_size', 2, 7, 'past_sequence_length + 1', 64), (4,)] Outputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Shape_16 (Shape) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [(4,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Transpose_3 (Transpose) Inputs: [('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.23/self_attn/Gather_16 (Gather) Inputs: [(4,)] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/Mul_9 (Mul) Inputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 64, 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.23/self_attn/Unsqueeze_30 (Unsqueeze) Inputs: [()] Outputs: [(1,)] -------------------------------------------------- Operation: /model/layers.23/self_attn/MatMul (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 64), ('batch_size', 14, 64, 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.23/self_attn/Slice_4 (Slice) Inputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1'), (1,)] Outputs: [('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.23/self_attn/Add_3 (Add) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 1, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.23/self_attn/Softmax (Softmax) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] Outputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1')] -------------------------------------------------- Operation: /model/layers.23/self_attn/MatMul_1 (MatMul) Inputs: [('batch_size', 14, 'sequence_length', 'past_sequence_length + 1'), ('batch_size', 14, 'past_sequence_length + 1', 64)] Outputs: [('batch_size', 14, 'sequence_length', 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Transpose_4 (Transpose) Inputs: [('batch_size', 14, 'sequence_length', 64)] Outputs: [('batch_size', 'sequence_length', 14, 64)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape_7 (Reshape) Inputs: [('batch_size', 'sequence_length', 14, 64), (3,)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/Reshape_7_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.23/self_attn/o_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/o_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/self_attn/o_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/self_attn/o_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/Add (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/post_attention_layernorm/Mul_1_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896), (), ()] -------------------------------------------------- Operation: /model/layers.23/mlp/gate_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/gate_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/gate_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/mlp/gate_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/up_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/up_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/up_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/mlp/up_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/act_fn/Sigmoid (Sigmoid) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/act_fn/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 4864), ('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864)] -------------------------------------------------- Operation: /model/layers.23/mlp/Mul_output_0_QuantizeLinear (DynamicQuantizeLinear) Inputs: [('batch_size', 'sequence_length', 4864)] Outputs: [('batch_size', 'sequence_length', 4864), (), ()] -------------------------------------------------- Operation: /model/layers.23/mlp/down_proj/MatMul_quant (MatMulInteger) Inputs: [('batch_size', 'sequence_length', 4864), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/mlp/down_proj/MatMul_output_0_output_quantized_cast (Cast) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/mlp/down_proj/MatMul_quant_scales_mul (Mul) Inputs: [()] Outputs: [()] -------------------------------------------------- Operation: /model/layers.23/mlp/down_proj/MatMul_quant_output_scale_mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ()] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/layers.23/Add_1 (Add) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/norm/Pow (Pow) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/norm/ReduceMean (ReduceMean) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/norm/Add (Add) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/norm/Sqrt (Sqrt) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/norm/Div (Div) Inputs: [('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 1)] -------------------------------------------------- Operation: /model/norm/Mul (Mul) Inputs: [('batch_size', 'sequence_length', 896), ('batch_size', 'sequence_length', 1)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: /model/norm/Mul_1 (Mul) Inputs: [('batch_size', 'sequence_length', 896)] Outputs: [('batch_size', 'sequence_length', 896)] -------------------------------------------------- Operation: model.embed_tokens.weight_transposed_DequantizeLinear (DequantizeLinear) Inputs: [(896, 151936)] Outputs: [(896, 151936)] -------------------------------------------------- Operation: /lm_head/MatMul (MatMul) Inputs: [('batch_size', 'sequence_length', 896), (896, 151936)] Outputs: [('batch_size', 'sequence_length', 151936)] --------------------------------------------------