把普通 PyTorch 模型转一个这样的 TorchScript 模型,有跟踪(trace)和记录(script)两种导出计算图的方法。
跟踪法只能通过实际运行一遍模型的方法导出模型的静态图,即无法识别出模型中的控制流(如循环);记录法则能通过解析模型来正确记录所有的控制流。
首先,模型forward输入必须是tensor,非tensor变量可能出错或者被写死成常数。
转torchscript:
问题1:
RuntimeError:
Expected a default value of type Tensor (inferred) on parameter "max_len".Because "max_len" was not annotated with an explicit type it is assumed to be type 'Tensor'.
def get_mask_from_lengths(lengths, max_len=None):
batch_size = lengths.shape[0]
if max_len is None:
max_len = torch.max(lengths).item()
ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).cuda()
mask = ids >= lengths.unsqueeze(1).expand(-1, max_len)
return mask
RuntimeError:
Tensors of type UndefinedTensorImpl do not have strides
class FFTBlock(nn.Module):
"""FFT Block"""
def __init__(self, d_model, n_head, d_k, d_v, d_inner, kernel_size, dropout=0.1):
super(FFTBlock, self).__init__()
self.slf_attn = MultiHeadAttention(n_head, d_model, d_k, d_v, dropout=dropout)
self.pos_ffn = PositionwiseFeedForward(
d_model, d_inner, kernel_size, dropout=dropout
)
def forward(self, enc_input, mask, slf_attn_mask): # mask=None, slf_attn_mask=None
enc_output, enc_slf_attn = self.slf_attn(
enc_input, enc_input, enc_input, slf_attn_mask
)
解决:尽量不使用默认参数,默认值为None会报错,默认参数不是tensor的话,需要指定类型
问题2:
RuntimeError:
python value of type 'device' cannot be used as a value. Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local varible instead.
ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).to(device)
解决:使用.cuda()
问题3:
RuntimeError:Type mismatch: max_len is set to type Tensor (inferred) in the true branch and type int in the false branch:
def pad(input_ele, mel_max_length):
if mel_max_length:
max_len = mel_max_length
else:
max_len = max([input_ele[i].size(0) for i in range(len(input_ele))])
解决:推理阶段还是尽量不用if判断语句
问题4:
RuntimeError: Arguments for call are not valid.
The following variants are available:
aten::list(str t) -> (str[]):
Argument t not provided.
def LR(self, x, duration):
output = list() # erro
for batch, expand_target in zip(x, duration):
expanded = self.expand(batch, expand_target)
output.append(expanded)
解决:output=list() --> output=[]
问题5:
RuntimeError:
aten::append.t(t[](a!) self, t(c -> *) el) -> (t[](a!)):
Could not match type int to t in argument 'el': Type variable 't' previously matched to type Tensor is matched to type int.
def LR(self, x):
out_len = []
batchsize = x.shape[0]
out_len.append(batchsize)
解决:要求指明数据类型
def LR(self, x):
out_len:List[int] = []
batchsize = x.shape[0]
out_len.append(batchsize)
问题6:
RuntimeError:Expected a value of type 'Tensor (inferred)' for argument 'input_ele' but instead found type 'List[Tensor]'.
def pad(input_ele):
max_len = max([input_ele[i].size(0) for i in range(len(input_ele))])
解决:推理阶段函数传参使用tensor,尽量不用其他类型变量传参,比如列表,整型变量等
问题7:
RuntimeError:
Expected integer literal for index. ModuleList/Sequential indexing is only supported with integer literals. Enumeration is supported, e.g. 'for index, v in enumerate(self): ...'
def forward(self, x):
x = self.conv_pre(x)
for i in range(3): # 3
x = F.leaky_relu(x, 0.1)
x = self.ups[i](x) # erro
解决:对modulelist,使用enumerate来遍历,不支持len(nn.ModuleList())和下标访问
def forward(self, x):
x = self.conv_pre(x)
# for i in range(3):
for i, ups in enumerate(self.ups):
x = F.leaky_relu(x, 0.1)
# x = self.ups[i](x) # erro
x = ups(x)
其它:
先给x赋值为None再赋值为一个tensor时会报错,需要初始化其为一个tensor类型数据
将x = None改为x = torch.tensor([])
torch.jit.frontend.UnsupportedNodeError: Dict aren’t supported
forward函数里初始化字典,由a={} 改成a=dict(),不过dict类型尽量不要在forward中使用,容易出错。
# 原代码:
view_shape[1:] = [1] * (len(view_shape) - 1)
# 更改后:
for i in range(1, len(view_shape)): view_shape[i] = 1
切片操作是python中所特有的操作方式,不支持切片对象的赋值操作,所以要用循环代替切片操作
if returnfps:
return new_xyz, new_points,grouped_points,idx
else:
return new_xyz, new_points
不允许由条件语句引发不同数量的返回值
torch.jit.frontend.UnsupportedNodeError: continue statements aren’t supported
不支持continue
torch.jit.frontend.UnsupportedNodeError: try blocks aren’t supported
不支持try-except
Unknown builtin op: aten::Tensor
不能使用torch.Tensor(),如果是把python中的int,float等类型转换成tensor可以使用torch.tensor(),如果是张量拼接尽量使用torch.cat
tensor.bool()操作不支持,可以直接用tensor>0来替代
在TorchScript中,有一种Optional类型,举例:在一个函数中,如果可以通过if控制来返回None或者tensor,那么这个返回值会被认定为Optional[Tensor],这会导致无法对该返回值使用tensor的内置方法或属性,比如tensor.shape,tensor.size()等。
ValueError: substring not found
模型中的注释使用英文,不用中文
转onnx:
1、不支持输入数据控制流,即foward函数里不能有条件判断语句;
2、使用了.item() 把torch中的张量转换成了普通的Python变量,涉及张量与普通变量转换的逻辑都可能导致onnx推理不太正确。