Pytorch模型结构代码编写细节注意事项-天翼云开发者社区

把普通 PyTorch 模型转一个这样的 TorchScript 模型，有跟踪（trace）和记录（script）两种导出计算图的方法。
跟踪法只能通过实际运行一遍模型的方法导出模型的静态图，即无法识别出模型中的控制流（如循环）；记录法则能通过解析模型来正确记录所有的控制流。

首先，模型forward输入必须是tensor，非tensor变量可能出错或者被写死成常数。

转torchscript：

问题1：

RuntimeError:

Expected a default value of type Tensor (inferred) on parameter "max_len".Because "max_len" was not annotated with an explicit type it is assumed to be type 'Tensor'.

def get_mask_from_lengths(lengths, max_len=None):

batch_size = lengths.shape[0]

if max_len is None:

max_len = torch.max(lengths).item()

ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).cuda()

mask = ids >= lengths.unsqueeze(1).expand(-1, max_len)

return mask

RuntimeError:

Tensors of type UndefinedTensorImpl do not have strides

class FFTBlock(nn.Module):

"""FFT Block"""

def __init__(self, d_model, n_head, d_k, d_v, d_inner, kernel_size, dropout=0.1):

super(FFTBlock, self).__init__()

self.slf_attn = MultiHeadAttention(n_head, d_model, d_k, d_v, dropout=dropout)

self.pos_ffn = PositionwiseFeedForward(

d_model, d_inner, kernel_size, dropout=dropout

)

def forward(self, enc_input, mask, slf_attn_mask): # mask=None, slf_attn_mask=None

enc_output, enc_slf_attn = self.slf_attn(

enc_input, enc_input, enc_input, slf_attn_mask

)

解决：尽量不使用默认参数，默认值为None会报错，默认参数不是tensor的话，需要指定类型

问题2：

RuntimeError:

python value of type 'device' cannot be used as a value. Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local varible instead.

ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).to(device)

解决：使用.cuda()

问题3：

RuntimeError:Type mismatch: max_len is set to type Tensor (inferred) in the true branch and type int in the false branch:

def pad(input_ele, mel_max_length):

if mel_max_length:

max_len = mel_max_length

else:

max_len = max([input_ele[i].size(0) for i in range(len(input_ele))])

解决：推理阶段还是尽量不用if判断语句

问题4：

RuntimeError: Arguments for call are not valid.

The following variants are available:

aten::list(str t) -> (str[]):

Argument t not provided.

def LR(self, x, duration):

output = list() # erro

for batch, expand_target in zip(x, duration):

expanded = self.expand(batch, expand_target)

output.append(expanded)

解决：output=list() --> output=[]

问题5：

RuntimeError:

aten::append.t(t[](a!) self, t(c -> *) el) -> (t[](a!)):

Could not match type int to t in argument 'el': Type variable 't' previously matched to type Tensor is matched to type int.

def LR(self, x):

out_len = []

batchsize = x.shape[0]

out_len.append(batchsize)

解决：要求指明数据类型

def LR(self, x):

out_len:List[int] = []

batchsize = x.shape[0]

out_len.append(batchsize)

问题6：

RuntimeError:Expected a value of type 'Tensor (inferred)' for argument 'input_ele' but instead found type 'List[Tensor]'.

def pad(input_ele):

max_len = max([input_ele[i].size(0) for i in range(len(input_ele))])

解决：推理阶段函数传参使用tensor，尽量不用其他类型变量传参，比如列表，整型变量等

问题7：

RuntimeError:

Expected integer literal for index. ModuleList/Sequential indexing is only supported with integer literals. Enumeration is supported, e.g. 'for index, v in enumerate(self): ...'

def forward(self, x):

x = self.conv_pre(x)

for i in range(3): # 3

x = F.leaky_relu(x, 0.1)

x = self.ups[i](x) # erro

解决：对modulelist，使用enumerate来遍历，不支持len(nn.ModuleList())和下标访问

def forward(self, x):

x = self.conv_pre(x)

# for i in range(3):

for i, ups in enumerate(self.ups):

x = F.leaky_relu(x, 0.1)

# x = self.ups[i](x) # erro

x = ups(x)

其它：

先给x赋值为None再赋值为一个tensor时会报错，需要初始化其为一个tensor类型数据

将x = None改为x = torch.tensor([])

torch.jit.frontend.UnsupportedNodeError: Dict aren’t supported

forward函数里初始化字典，由a={} 改成a=dict()，不过dict类型尽量不要在forward中使用，容易出错。

# 原代码：

view_shape[1:] = [1] * (len(view_shape) - 1)

# 更改后：

for i in range(1, len(view_shape)): view_shape[i] = 1

切片操作是python中所特有的操作方式，不支持切片对象的赋值操作，所以要用循环代替切片操作

if returnfps:

return new_xyz, new_points,grouped_points,idx

else:

return new_xyz, new_points

不允许由条件语句引发不同数量的返回值

torch.jit.frontend.UnsupportedNodeError: continue statements aren’t supported

不支持continue

torch.jit.frontend.UnsupportedNodeError: try blocks aren’t supported

不支持try-except

Unknown builtin op: aten::Tensor

不能使用torch.Tensor()，如果是把python中的int，float等类型转换成tensor可以使用torch.tensor()，如果是张量拼接尽量使用torch.cat

tensor.bool()操作不支持，可以直接用tensor>0来替代

在TorchScript中，有一种Optional类型，举例：在一个函数中，如果可以通过if控制来返回None或者tensor，那么这个返回值会被认定为Optional[Tensor]，这会导致无法对该返回值使用tensor的内置方法或属性，比如tensor.shape,tensor.size()等。

ValueError: substring not found

模型中的注释使用英文，不用中文

转onnx：

1、不支持输入数据控制流，即foward函数里不能有条件判断语句；

2、使用了.item() 把torch中的张量转换成了普通的Python变量，涉及张量与普通变量转换的逻辑都可能导致onnx推理不太正确。

首先，模型forward输入必须是tensor，非tensor变量可能出错或者被写死成常数。

转torchscript：

问题1：

RuntimeError:

Expected a default value of type Tensor (inferred) on parameter "max_len".Because "max_len" was not annotated with an explicit type it is assumed to be type 'Tensor'.

def get_mask_from_lengths(lengths, max_len=None):

batch_size = lengths.shape[0]

if max_len is None:

max_len = torch.max(lengths).item()

ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).cuda()

mask = ids >= lengths.unsqueeze(1).expand(-1, max_len)

return mask

RuntimeError:

Tensors of type UndefinedTensorImpl do not have strides

class FFTBlock(nn.Module):

"""FFT Block"""

def __init__(self, d_model, n_head, d_k, d_v, d_inner, kernel_size, dropout=0.1):

super(FFTBlock, self).__init__()

self.slf_attn = MultiHeadAttention(n_head, d_model, d_k, d_v, dropout=dropout)

self.pos_ffn = PositionwiseFeedForward(

d_model, d_inner, kernel_size, dropout=dropout

)

def forward(self, enc_input, mask, slf_attn_mask): # mask=None, slf_attn_mask=None

enc_output, enc_slf_attn = self.slf_attn(

enc_input, enc_input, enc_input, slf_attn_mask

)

解决：尽量不使用默认参数，默认值为None会报错，默认参数不是tensor的话，需要指定类型

问题2：

RuntimeError:

python value of type 'device' cannot be used as a value. Perhaps it is a closed over global variable? If so, please consider passing it in as an argument or use a local varible instead.

ids = torch.arange(0, max_len).unsqueeze(0).expand(batch_size, -1).to(device)

解决：使用.cuda()

问题3：

RuntimeError:Type mismatch: max_len is set to type Tensor (inferred) in the true branch and type int in the false branch:

def pad(input_ele, mel_max_length):

if mel_max_length:

max_len = mel_max_length

else:

max_len = max([input_ele[i].size(0) for i in range(len(input_ele))])

解决：推理阶段还是尽量不用if判断语句

问题4：

RuntimeError: Arguments for call are not valid.

The following variants are available:

aten::list(str t) -> (str[]):

Argument t not provided.

def LR(self, x, duration):

output = list() # erro

for batch, expand_target in zip(x, duration):

expanded = self.expand(batch, expand_target)

output.append(expanded)

解决：output=list() --> output=[]

问题5：

RuntimeError:

aten::append.t(t[](a!) self, t(c -> *) el) -> (t[](a!)):

Could not match type int to t in argument 'el': Type variable 't' previously matched to type Tensor is matched to type int.

def LR(self, x):

out_len = []

batchsize = x.shape[0]

out_len.append(batchsize)

解决：要求指明数据类型

def LR(self, x):

out_len:List[int] = []

batchsize = x.shape[0]

out_len.append(batchsize)

问题6：

RuntimeError:Expected a value of type 'Tensor (inferred)' for argument 'input_ele' but instead found type 'List[Tensor]'.

def pad(input_ele):

max_len = max([input_ele[i].size(0) for i in range(len(input_ele))])

解决：推理阶段函数传参使用tensor，尽量不用其他类型变量传参，比如列表，整型变量等

问题7：

RuntimeError:

Expected integer literal for index. ModuleList/Sequential indexing is only supported with integer literals. Enumeration is supported, e.g. 'for index, v in enumerate(self): ...'

def forward(self, x):

x = self.conv_pre(x)

for i in range(3): # 3

x = F.leaky_relu(x, 0.1)

x = self.ups[i](x) # erro

解决：对modulelist，使用enumerate来遍历，不支持len(nn.ModuleList())和下标访问

def forward(self, x):

x = self.conv_pre(x)

# for i in range(3):

for i, ups in enumerate(self.ups):

x = F.leaky_relu(x, 0.1)

# x = self.ups[i](x) # erro

x = ups(x)

其它：

先给x赋值为None再赋值为一个tensor时会报错，需要初始化其为一个tensor类型数据

将x = None改为x = torch.tensor([])

torch.jit.frontend.UnsupportedNodeError: Dict aren’t supported

forward函数里初始化字典，由a={} 改成a=dict()，不过dict类型尽量不要在forward中使用，容易出错。

# 原代码：

view_shape[1:] = [1] * (len(view_shape) - 1)

# 更改后：

for i in range(1, len(view_shape)): view_shape[i] = 1

切片操作是python中所特有的操作方式，不支持切片对象的赋值操作，所以要用循环代替切片操作

if returnfps:

return new_xyz, new_points,grouped_points,idx

else:

return new_xyz, new_points

不允许由条件语句引发不同数量的返回值

torch.jit.frontend.UnsupportedNodeError: continue statements aren’t supported

不支持continue

torch.jit.frontend.UnsupportedNodeError: try blocks aren’t supported

不支持try-except

Unknown builtin op: aten::Tensor

不能使用torch.Tensor()，如果是把python中的int，float等类型转换成tensor可以使用torch.tensor()，如果是张量拼接尽量使用torch.cat

tensor.bool()操作不支持，可以直接用tensor>0来替代

ValueError: substring not found

模型中的注释使用英文，不用中文

转onnx：

1、不支持输入数据控制流，即foward函数里不能有条件判断语句；

2、使用了.item() 把torch中的张量转换成了普通的Python变量，涉及张量与普通变量转换的逻辑都可能导致onnx推理不太正确。

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

Pytorch模型结构代码编写细节注意事项

Pytorch模型结构代码编写细节注意事项

活动

智算服务

应用商城

合作伙伴

开发者

支持与服务

了解天翼云

Pytorch模型结构代码编写细节注意事项

Pytorch模型结构代码编写细节注意事项