快捷搜索:
来自 计算机编程 2019-07-07 05:44 的文章
当前位置: 67677新澳门手机版 > 计算机编程 > 正文

【67677新澳门手机版】【翻译】理解 LSTM 网络

目录

四个主要问题:

  • 理解 LSTM 网络
    • 递归神经网络
    • 长期依赖性问题
    • LSTM 网络
    • LSTM 的核心想法
    • 逐步解析 LSTM 的流程
    • 长短期记忆的变种
    • 结论
    • 鸣谢
  1. 是什么?
  2. 为什么?
  3. 做什么?
  4. 怎么做?

本文翻译自 Christopher Olah 的博文 [Understanding LSTM Networks]( LSTM 网络。

本文主要根据Understanding LSTM Networks-colah's blog 编写,包括翻译并增加了自己浅薄的理解。

【翻译】理解 LSTM 及其图示 或许可以进一步帮助理解。

LSTM是什么?

以下定义摘自百度百科

LSTM(Long Short-Term Memory) 长短期记忆网络,是一种时间递归神经网络,适合于处理和预测时间序列中间隔和延迟相对较长的重要事件。

理解 LSTM 网络

Understanding LSTM Networks

LSTM为什么产生?

递归神经网络

Recurrent Neural Networks

人类并不是时刻都从头开始思考。如果你阅读这篇文章,你是在之前词汇的基础上理解每一个词汇,你不需要丢掉一切从头开始思考。你的思想具有延续性。

传统的神经网络无法做到这样,并且这成为了一个主要的缺陷。例如,想像一下你需要对一部电影中正在发生的事件做出判断。目前还不清楚传统的神经网络如何根据先前发生的事件来推测之后发生的事件。

递归神经网络正好用来解决这个问题。递归神经网络的内部存在着循环,用来保持信息的延续性。

Humans don't start their thinking from scratch every second. As you read this essay, you understand each word based on your understanding of previous words. You don't throw everything away and start thinking from scratch again. Your thoughts have persistence.

Traditional neural networks can't do this, and it seems like a major shortcoming. For example, imagine you want to classify what kind of event is happening at every point in a movie. It's unclear how a traditional neural network could use its reasoning about previous events in the film to inform later ones.

Recurrent neural networks address this issue. They are networks with loops in them, allowing information to persist.

67677新澳门手机版 1

上图中有局部神经网络——(A),输入值 (x_t),和输出值 (h_t) 。一个循环保证信息一步一步在网络中传递。

这些循环让递归神经网络难以理解。但是,如果仔细想想就会发现,它们和普通的神经网络没什么区别。一个递归神经网络可以看作是一组相同的网络,每一个网络都将信息传递给下一个。如果展开循环就会看到:

In the above diagram, a chunk of neural network, class="math inline">(A), looks at some input class="math inline">(x_t) and outputs a value class="math inline">(h_t). A loop allows information to be passed from one step of the network to the next.

These loops make recurrent neural networks seem kind of mysterious. However, if you think a bit more, it turns out that they aren't all that different than a normal neural network. A recurrent neural network can be thought of as multiple copies of the same network, each passing a message to a successor. Consider what happens if we unroll the loop:

67677新澳门手机版 2

这个链式结构自然地揭示出递归神经网络和序列与列表紧密相关。这是用于处理序列数据的神经网络的自然架构。

当然,也是可用的。最近几年,RNN 在语音识别、语言建模、翻译、图像描述等等领域取得了难以置信的成功。我把对 RNN 所取得成果的讨论留在 Andrej Karpathy 的博客里。RNN 真的很神奇!

这些成功的关键是 “LSTM” ——一种特殊的递归神经网络,在许多问题上比标准版本的 RNN 好得多。几乎所有递归神经网络取得的出色成果均源于 LSTM 的使用。这篇文章要介绍的正是 LSTM。

This chain-like nature reveals that recurrent neural networks are intimately related to sequences and lists. They're the natural architecture of neural network to use for such data.

And they certainly are used! In the last few years, there have been incredible success applying RNNs to a variety of problems: speech recognition, language modeling, translation, image captioning… The list goes on. I'll leave discussion of the amazing feats one can achieve with RNNs to Andrej Karpathy's excellent blog post, The Unreasonable Effectiveness of Recurrent Neural Networks. But they really are pretty amazing.

Essential to these successes is the use of “LSTMs,” a very special kind of recurrent neural network which works, for many tasks, much much better than the standard version. Almost all exciting results based on recurrent neural networks are achieved with them. It's these LSTMs that this essay will explore.

RNN

一般神经网络没有考虑数据的持续影响。通常,前面输入神经元的数据对后输入的数据有影响。考虑到这点或者说为了解决传统神经网络不能捕捉/利用previous event affect the later ones,提出了RNN,网络中加入循环。下图是RNN网络示图。

67677新澳门手机版 3

RNN

RNN网络实质上是多个普通神经网络的连接,每个神经元向下一个传递信息,如下图所示:

67677新澳门手机版 4

RNN链式结构

"LSTMs",a very special kind of recurrent neural network which works,for many tasks,much much better tahn the standard version.

长期依赖性问题

The Problem of Long-Term Dependencies

RNN 的吸引力之一是它们能够将先前的信息与当前的问题连接,例如使用先前的视频画面可以启发对当前画面的理解。如果 RNN 可以做到这一点,它们会非常有用。但它可以吗?嗯,这是有条件的。

有时候,我们只需要查看最近的信息来应对当前的问题。例如,一个语言模型试图根据先前的词汇预测下一个词汇。如果我们试图预测 “the clouds are in the sky” 中的最后一个词,我们不需要任何进一步的上下文背景,很明显,下一个词将是 sky。在这种情况下,相关信息与它所在位置之间的距离很小,RNN 可以学习使用过去的信息。

One of the appeals of RNNs is the idea that they might be able to connect previous information to the present task, such as using previous video frames might inform the understanding of the present frame. If RNNs could do this, they'd be extremely useful. But can they? It depends.

Sometimes, we only need to look at recent information to perform the present task. For example, consider a language model trying to predict the next word based on the previous ones. If we are trying to predict the last word in “the clouds are in the sky,” we don't need any further context –– it's pretty obvious the next word is going to be sky. In such cases, where the gap between the relevant information and the place that it's needed is small, RNNs can learn to use the past information.

67677新澳门手机版 5

但也有些情况下我们需要更多的上下文。考虑尝试预测 “I grew up in France… I speak fluent French.” 中的最后一个词。最近的信息表明,下一个单词可能是一种语言的名称,但如果我们想要具体到哪种语言,我们需要从更远的地方获得上下文——France。因此,相关信息与它所在位置之间的距离非常大是完全可能的。

遗憾的是,随着距离的增大,RNN 开始无法将信息连接起来。

But there are also cases where we need more context. Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.” Recent information suggests that the next word is probably the name of a language, but if we want to narrow down which language, we need the context of France, from further back. It's entirely possible for the gap between the relevant information and the point where it is needed to become very large.

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.

67677新澳门手机版 6

理论上,RNN 绝对有能力处理这种“长期依赖性”。人类可通过仔细挑选参数来解决这种形式的“玩具问题”。遗憾的是在实践中,RNN 似乎无法学习它们。这个问题是由 Hochreiter 和 Bengio 等人深入探讨。他发现了问题变困难的根本原因。

谢天谢地,LSTM 没这种问题!

In theory, RNNs are absolutely capable of handling such “long-term dependencies.” A human could carefully pick parameters for them to solve toy problems of this form. Sadly, in practice, RNNs don't seem to be able to learn them. The problem was explored in depth by Hochreiter (1991) [German] and Bengio, et al. (1994), who found some pretty fundamental reasons why it might be difficult.

Thankfully, LSTMs don't have this problem!

The Problem of Long-Term Dependencies[1]

RNNs模型可以connect previous information to the present task,such as using previous video frames might inform the understanding of the present frame.

RNNs如何实现上述目标呢?这需要按情况而定。

有时,我们只需要查看最近的信息来执行当前的任务。 例如,考虑一个语言模型试图根据以前的单词预测下一个词。 如果我们试图预测“the clouds are in the sky ”的最后一个词,我们不需要任何进一步的背景(上下文) - 很明显,下一个词将是sky。 在这种情况下,当前任务训练时RNNs模型需要过去n个信息且n很小。the gap between the relevant information and the place that it’s needed is small

但是也有需要很多上下文信息的情况。如果我们试图预测长句的最后一个单词:Consider trying to predict the last word in the text “I grew up in France… I speak fluent French.”,最近的信息I speak fluent French表示/提示下一个单词可能是某种语言的名称,但是如果我们缩小范围到具体某种语言时,我们需要关于France的背景信息。那么使用RNNs训练时需要过去n个信息,且n要足够大。the gap between the relevant information and the point where it is needed to become very large

Unfortunately, as that gap grows, RNNs become unable to learn to connect the information.

理论上,RNNs可以处理“long-term dependencies.”,但是实际操作中,RNNs不能学习/训练这样的问题,即需要的过去信息n数量过大的情况下,RNNs将不再适用。The problem was explored in depth by Hochreiter (1991) [German] and Bengio, et al. (1994), who found some pretty fundamental reasons why it might be difficult.

LSTM模型可以处理“long-term dependencies”的问题

LSTM 网络

LSTM Networks

长短期记忆网络——通常被称为 LSTM,是一种特殊的 RNN,能够学习长期依赖性。由 Hochreiter 和 Schmidhuber(1997)提出的,并且在接下来的工作中被许多人改进和推广。LSTM 在各种各样的问题上表现非常出色,现在被广泛使用。

LSTM 被明确设计用来避免长期依赖性问题。长时间记住信息实际上是 LSTM 的默认行为,而不是需要努力学习的东西!

所有递归神经网络都具有神经网络的链式重复模块。在标准的 RNN 中,这个重复模块具有非常简单的结构,例如只有单个 tanh 层。

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by Hochreiter & Schmidhuber (1997), and were refined and popularized by many people in following work.1 They work tremendously well on a large variety of problems, and are now widely used.

LSTMs are explicitly designed to avoid the long-term dependency problem. Remembering information for long periods of time is practically their default behavior, not something they struggle to learn!

All recurrent neural networks have the form of a chain of repeating modules of neural network. In standard RNNs, this repeating module will have a very simple structure, such as a single tanh layer.

67677新澳门手机版 7

LSTM 也具有这种类似的链式结构,但重复模块具有不同的结构。不是一个单独的神经网络层,而是四个,并且以非常特殊的方式进行交互。

LSTMs also have this chain like structure, but the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

67677新澳门手机版 8

不要担心细节。稍后我们将逐步浏览 LSTM 的图解。现在,让我们试着去熟悉我们将使用的符号。

Don't worry about the details of what's going on. We'll walk through the LSTM diagram step by step later. For now, let's just try to get comfortable with the notation we'll be using.

67677新澳门手机版 9

在上面的图中,每行包含一个完整的向量,从一个节点的输出到其他节点的输入。粉色圆圈表示逐点运算,如向量加法;而黄色框表示学习的神经网络层。行合并表示串联,而分支表示其内容正在被复制,并且副本将转到不同的位置。

In the above diagram, each line carries an entire vector, from the output of one node to the inputs of others. The pink circles represent pointwise operations, like vector addition, while the yellow boxes are learned neural network layers. Lines merging denote concatenation, while a line forking denote its content being copied and the copies going to different locations.

LSTM做什么?

本文由67677新澳门手机版发布于计算机编程,转载请注明出处:【67677新澳门手机版】【翻译】理解 LSTM 网络

关键词: