Word2Vec学习总结

1	Word2Vec学习笔记，附带Tensorflow的CBOW实现。

神经概率语言模型

词向量：$v(w) \in R^m$
$m$是词向量的维度，通常是$10^1\sim 10^2$的量级。
神经网络参数：$W \in R^{n_h×(n-1)m}$，$p \in R^{n_h}$，$U\in R^{n_h×N}$，$q \in R^N$
$n$：上下文词数，通常不超过5。
$n_h$：隐层的维度，用户指定，通常是$10^2$的量级。
$N$：语料的大小，通常是$10^4\sim 10^5$的量级。
通过神经网络的反向传播，更新$v(w)$，最终获得w2v。
$x_w$是各词向量之和。

CBOW

Skip-Gram

如何解决类别过多的问题？

Corpus的数量是$N$，那么分类的类别即$N$。直接拿去做这么多个类的分类：输出层的参数$U\in R^{n_h×N}$，太大。解决的思路有两个：

1.将一次的多分类转换为多次的二分类：Hierachical
Softmax。这样，针对每一个分类器有一个参数向量，参数量是：$log_2N×m$。
2.每个词向量对应一个参数向量，参数量稍有下降，是：$N×m$，但每次训练只更新其中的一部分，而不是全部：负采样。

Hierachical Softmax的梯度计算（CBOW）

$p^w$：从根结点出发到达$w$对应叶子结点的路径。
$l^w$：路径$p^w$中包含结点的个数。
$p_1^w,p_2^w,…,p_{l^w}^w$：路径$p^w$中的$l^w$个结点，其中，$p^w_1$表示根结点，$p_{l^w}^w$表示词$w$对应的结点。
$d_2^w,…,d_{l^w}^w \in {0,1}$：词$w$的Huffman编码，由$l^w-1$位编码组成（根结点不编码），$d_j^w$表示路径$p^w$中第$j$个结点对应的编码。
$\theta_1^w,\theta_2^w,…,\theta_{l^w-1}^w \in R^m$：路径$p^w$中非叶子结点对应的向量，$\theta_j^w$表示路径中$p^w$中第$j$个非叶子结点对应的向量。

对于词典$D$中的任意词$w$，Huffman树中必存在一条从根节点到词$w$对应结点的唯一的路径$p^w$。路径$p^w$上存在$l^w-1$个分支，每个分支即一次二次分类，每次分类产生一个概率，将所有这些概率乘起来，就是所需的$p(w|Context(w))$。

$$p(w|Context(w)) = \prod_{j=2}^{l^w}p(d_j^w|x_w,\theta^w_{j-1})$$

其中：

基于负采样的模型（CBOW）

目的：提高训练速度，改善所得词向量的质量。

不使用Huffman树，使用随机负采样。

什么是负样本：对于给定的$Context(w)$，词$w$是一个正样本，其他词是负样本。

负样本子集：$NEG(w)$。

给定正样本：$(Context(w),w)$，希望最大化：$g(w) = \underset{u \in {w} \cup NEG(w)}{\prod}p(u|Context(w))$

其中：

$L^w(\tilde w)$代表词$\tilde w$是否就是词$w$。

$\theta^u \in R^m$表示词$u$对应的辅助参数向量。

Tensorflow实现

下面的代码实现的是全类别的softmax的方法，没有使用负采样或者hierachical softmax。

# These are all the modules we'll be using later. Make sure you can import them
# before proceeding further.
%matplotlib inline
from __future__ import print_function
import collections
import math
import numpy as np
import os
import random
import tensorflow as tf
import zipfile
from matplotlib import pylab
from six.moves import range
from six.moves.urllib.request import urlretrieve
from sklearn.manifold import TSNE

filename = "data/text8.zip"
def read_data(filename):
    """Extract the first file enclosed in a zip file as a list of words"""
    with zipfile.ZipFile(filename) as f:
        data = tf.compat.as_str(f.read(f.namelist()[0])).split()
    return data
  
words = read_data(filename)
print('Data size %d' % len(words))

1	Data size 17005207

1	print(words[:50])

['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the', 'diggers', 'of', 'the', 'english', 'revolution', 'and', 'the', 'sans', 'culottes', 'of', 'the', 'french', 'revolution', 'whilst', 'the', 'term', 'is', 'still', 'used', 'in', 'a', 'pejorative', 'way', 'to', 'describe', 'any', 'act', 'that', 'used', 'violent', 'means', 'to', 'destroy', 'the']

vocabulary_size = 50000

def build_dataset(words):
    count = [['UNK', -1]]
    count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
    dictionary = dict()
    for word, _ in count:
        dictionary[word] = len(dictionary)
    data = list()
    unk_count = 0
    for word in words:
        if word in dictionary:
            index = dictionary[word]
        else:
            index = 0  # dictionary['UNK']
            unk_count = unk_count + 1
        data.append(index)
    count[0][1] = unk_count
    reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys())) 
    return data, count, dictionary, reverse_dictionary

data, count, dictionary, reverse_dictionary = build_dataset(words)
print('Most common words (+UNK)', count[:5])
print('Sample data', data[:10])
del words  # Hint to reduce memory.

1 2	Most common words (+UNK) [['UNK', 418391], ('the', 1061396), ('of', 593677), ('and', 416629), ('one', 411764)] Sample data [5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156]

CBOW 实现

data_index = 0
 
def generate_batch(batch_size, bag_window):
    global data_index
    span = 2 * bag_window + 1 # [ bag_window target bag_window ]
    batch = np.ndarray(shape=(batch_size, span - 1), dtype=np.int32)
    labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
    buffer = collections.deque(maxlen=span)
    for _ in range(span):
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    for i in range(batch_size):
        # just for testing
        buffer_list = list(buffer)
        labels[i, 0] = buffer_list.pop(bag_window)
        batch[i] = buffer_list
        # iterate to the next buffer
        buffer.append(data[data_index])
        data_index = (data_index + 1) % len(data)
    return batch, labels
 
print('data:', [reverse_dictionary[di] for di in data[:16]])
 
for bag_window in [1, 2]:
    data_index = 0
    batch, labels = generate_batch(batch_size=4, bag_window=bag_window)
    print('\nwith bag_window = %d:' % (bag_window))
    print('    batch:', [[reverse_dictionary[w] for w in bi] for bi in batch])
    print('    labels:', [reverse_dictionary[li] for li in labels.reshape(4)])

data: ['anarchism', 'originated', 'as', 'a', 'term', 'of', 'abuse', 'first', 'used', 'against', 'early', 'working', 'class', 'radicals', 'including', 'the']

with bag_window = 1:
    batch: [['anarchism', 'as'], ['originated', 'a'], ['as', 'term'], ['a', 'of']]
    labels: ['originated', 'as', 'a', 'term']

with bag_window = 2:
    batch: [['anarchism', 'originated', 'a', 'term'], ['originated', 'as', 'term', 'of'], ['as', 'a', 'of', 'abuse'], ['a', 'term', 'abuse', 'first']]
    labels: ['as', 'a', 'term', 'of']

batch_size = 128
embedding_size = 128  # Dimension of the embedding vector.
bag_window = 2  # How many words to consider left and right.
# We pick a random validation set to sample nearest neighbors. here we limit the
# validation samples to the words that have a low numeric ID, which by
# construction are also the most frequent.
valid_size = 16  # Random set of words to evaluate similarity on.
valid_window = 100  # Only pick dev samples in the head of the distribution.
valid_examples = np.array(random.sample(range(valid_window), valid_size))
num_sampled = 64  # Number of negative examples to sample.
 
graph = tf.Graph()
 
with graph.as_default():
    # Input data.
    train_dataset = tf.placeholder(tf.int32, shape=[batch_size, bag_window * 2])
    train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
    valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
 
    # Variables.
    embeddings = tf.Variable(
        tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
    softmax_weights = tf.Variable(
        tf.truncated_normal([vocabulary_size, embedding_size],
                            stddev=1.0 / math.sqrt(embedding_size)))
    softmax_biases = tf.Variable(tf.zeros([vocabulary_size]))
 
    # Model.
    # Look up embeddings for inputs.
    embeds = tf.nn.embedding_lookup(embeddings, train_dataset)
    # Compute the softmax loss, using a sample of the negative labels each time.
    loss = tf.reduce_mean(
        tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases, train_labels, 
                                   tf.reduce_sum(embeds, 1),
                                num_sampled, vocabulary_size))
 
    # Optimizer.
    optimizer = tf.train.AdagradOptimizer(1.0).minimize(loss)
 
    # Compute the similarity between minibatch examples and all embeddings.
    # We use the cosine distance:
    norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
    normalized_embeddings = embeddings / norm
    valid_embeddings = tf.nn.embedding_lookup(
        normalized_embeddings, valid_dataset)
    similarity = tf.matmul(valid_embeddings, tf.transpose(normalized_embeddings))
num_steps = 100001

with tf.Session(graph=graph) as session:
    tf.initialize_all_variables().run()
    print('Initialized')
    average_loss = 0
    for step in range(num_steps):
        batch_data, batch_labels = generate_batch(
            batch_size, bag_window)
        feed_dict = {train_dataset: batch_data, train_labels: batch_labels}
        _, l = session.run([optimizer, loss], feed_dict=feed_dict)
        average_loss += l
        if step % 2000 == 0:
            if step > 0:
                average_loss = average_loss / 2000
            # The average loss is an estimate of the loss over the last 2000 batches.
            print('Average loss at step %d: %f' % (step, average_loss))
            average_loss = 0
        # note that this is expensive (~20% slowdown if computed every 500 steps)
        if step % 10000 == 0:
            sim = similarity.eval()
            for i in range(valid_size):
                valid_word = reverse_dictionary[valid_examples[i]]
                top_k = 8  # number of nearest neighbors
                nearest = (-sim[i, :]).argsort()[1:top_k + 1]
                log = 'Nearest to %s:' % valid_word
                for k in range(top_k):
                    close_word = reverse_dictionary[nearest[k]]
                    log = '%s %s,' % (log, close_word)
                print(log)
    final_embeddings = normalized_embeddings.eval()
num_points = 400

# 计算tsne，可视化词向量在二维空间上的分布
tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
two_d_embeddings = tsne.fit_transform(final_embeddings[1:num_points + 1, :])
 
def plot(embeddings, labels):
    assert embeddings.shape[0] >= len(labels), 'More labels than embeddings'
    pylab.figure(figsize=(15, 15))  # in inches
    for i, label in enumerate(labels):
        x, y = embeddings[i, :]
        pylab.scatter(x, y)
        pylab.annotate(label, xy=(x, y), xytext=(5, 2), textcoords='offset points',
                       ha='right', va='bottom')
    pylab.show()
 
 
words = [reverse_dictionary[i] for i in range(1, num_points + 1)]
plot(two_d_embeddings, words)

WARNING:tensorflow:From <ipython-input-13-a25f3f405467>:2: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
Initialized
Average loss at step 0: 8.374084
Nearest to first: mistletoe, ladyland, fathered, xml, exploitable, scare, fabian, mages,
Nearest to nine: going, transferred, copacabana, lilian, dhamma, continuity, ivy, churchill,
Nearest to all: patanjali, alternates, benign, exponentially, amazons, fullback, analyzed, marlon,
Nearest to most: api, treating, rehab, objectors, disastrously, chamada, fugitives, subhas,
Nearest to or: chiapas, travels, dazzling, shag, geopolitical, below, less, granted,
Nearest to that: mozambican, tome, buddhism, concentrations, unison, allied, sotho, wu,
Nearest to use: gaff, marmalade, harford, ujjain, taxonomists, ballroom, rarity, persisted,
Nearest to into: konkan, indefatigable, ams, caret, irreverent, allman, alcal, rivalry,
Nearest to system: pressurized, pedestal, affect, powered, preferably, gcd, morning, skyview,
Nearest to were: campaigner, unusable, copyrights, wells, waterhouse, janata, touch, engraving,
Nearest to with: ranges, crystallographic, gy, condensate, rojas, bijective, maria, university,
Nearest to th: polycarbonate, irene, nik, shellfish, basenjis, measurements, reclassification, mens,
Nearest to zero: thessaly, blows, contingents, moulton, hurdler, flip, hbf, minotaurs,
Nearest to other: purchased, concluding, sicily, midwest, paranoid, hurt, ssc, cobra,
Nearest to world: deoxyribonucleic, rectum, stricken, oven, mt, diverged, pluralistic, perceptive,
Nearest to during: connelly, thirsty, nationalized, protector, lick, lockport, judgements, russcol,
Average loss at step 2000: 4.572921
Average loss at step 4000: 3.943233
Average loss at step 6000: 3.713809
Average loss at step 8000: 3.530180
Average loss at step 10000: 3.475878
Nearest to first: second, civilization, salsa, divisor, essay, braves, mikvah, tiered,
Nearest to nine: six, eight, seven, zero, five, four, two, three,
Nearest to all: many, some, these, tito, several, steadfast, other, stockton,
Nearest to most: more, tendency, waff, many, very, clerk, enhanced, juvenile,
Nearest to or: than, and, ewald, cmi, aedile, gangster, pipe, suns,
Nearest to that: which, what, marlins, gunto, who, redirected, competitors, syrups,
Nearest to use: taxonomists, specialisation, marmalade, flaherty, reason, salvador, trojans, precarious,
Nearest to into: caret, ecumenism, ams, macrocosm, thales, indefatigable, konkan, disobey,
Nearest to system: assumes, affect, compilation, handicrafts, slugs, electro, apostles, promise,
Nearest to were: are, was, being, sketch, encloses, yorkers, kekul, cadbury,
Nearest to with: rheingold, grossed, satanic, temp, mond, fairly, cavers, rickenbacker,
Nearest to th: nd, st, nine, globally, patrilineal, transhumanism, companionship, vortex,
Nearest to zero: five, nine, seven, eight, four, three, six, two,
Nearest to other: many, these, various, duquette, several, arlene, all, some,
Nearest to world: holliday, postwar, arnauld, catechism, inventive, caribs, recipient, cultivate,
Nearest to during: nas, hohenstaufen, before, cautiously, acknowledging, in, cadet, since,
Average loss at step 12000: 3.493503
Average loss at step 14000: 3.446680
Average loss at step 16000: 3.443846
Average loss at step 18000: 3.397030
Average loss at step 20000: 3.218047
Nearest to first: second, last, chaplin, buoyancy, civilization, next, most, title,
Nearest to nine: eight, seven, zero, five, two, three, six, th,
Nearest to all: many, several, various, some, lehi, chia, cal, these,
Nearest to most: more, many, tendency, very, waff, fund, some, clerk,
Nearest to or: and, than, pagos, shapeshifting, wtoo, grigory, slew, rss,
Nearest to that: which, what, where, this, when, grant, bounding, agnes,
Nearest to use: form, trojans, reason, falsified, because, specialisation, fortify, price,
Nearest to into: through, resupply, ams, within, from, in, thales, ecumenism,
Nearest to system: systems, electro, sector, orbison, violate, baptised, foonly, affect,
Nearest to were: are, was, is, have, include, pederastic, cuar, overcrowded,
Nearest to with: using, between, and, sampras, soddy, gracie, semper, generically,
Nearest to th: st, twentieth, nd, globally, nine, renminbi, worsened, alexandrian,
Nearest to zero: five, eight, seven, six, nine, four, three, two,
Nearest to other: various, different, these, both, colonisation, titration, many, tankers,
Nearest to world: amylase, metaphor, mandibles, recipient, policeman, prisoner, postwar, spans,
Nearest to during: before, since, throughout, in, until, after, through, laminated,
Average loss at step 22000: 3.358755
Average loss at step 24000: 3.296899
Average loss at step 26000: 3.254450
Average loss at step 28000: 3.287042
Average loss at step 30000: 3.228198
Nearest to first: second, last, third, next, same, largest, autonomously, best,
Nearest to nine: eight, seven, six, zero, five, four, three, two,
Nearest to all: any, many, every, some, both, various, several, each,
Nearest to most: many, some, more, less, incredible, tendency, waff, huffman,
Nearest to or: and, than, linguistics, gangster, worldstatesmen, but, like, napoli,
Nearest to that: which, this, what, where, when, however, because, syrups,
Nearest to use: sense, trojans, development, begin, cav, stampede, safety, price,
Nearest to into: through, within, in, thales, resupply, down, throughout, concocted,
Nearest to system: systems, electro, affect, violate, foonly, companies, era, rolfe,
Nearest to were: are, was, include, heero, is, have, firmness, being,
Nearest to with: between, nonnegative, soddy, timer, fingerprinting, satanic, including, letterboxed,
Nearest to th: twentieth, nd, st, nine, globally, next, renminbi, propagated,
Nearest to zero: five, nine, eight, six, seven, four, three, two,
Nearest to other: various, these, different, vespers, heterochromatin, sabina, dolphins, fingertips,
Nearest to world: catechism, decade, u, civil, mandibles, misanthropy, lrv, alcoholism,
Nearest to during: after, before, throughout, until, since, in, cordwainer, cambrian,
Average loss at step 32000: 3.013517
Average loss at step 34000: 3.205457
Average loss at step 36000: 3.187721
Average loss at step 38000: 3.158000
Average loss at step 40000: 3.159957
Nearest to first: second, last, same, best, third, next, fourth, original,
Nearest to nine: eight, seven, six, five, four, zero, two, th,
Nearest to all: every, many, any, several, some, various, fluke, certain,
Nearest to most: less, more, some, highest, authorize, many, refracting, waff,
Nearest to or: and, than, filioque, adour, directs, lamar, occult, decried,
Nearest to that: which, although, however, monatomic, evangelica, attrition, this, repudiate,
Nearest to use: sense, trojans, addition, production, achieve, begin, influx, falsified,
Nearest to into: through, within, throughout, in, down, heather, from, across,
Nearest to system: systems, foonly, program, affect, process, materiel, monitor, method,
Nearest to were: are, was, is, being, include, have, had, became,
Nearest to with: between, within, satanic, soddy, meals, without, utilizes, adage,
Nearest to th: st, nd, twentieth, globally, nine, worsened, propagated, fourth,
Nearest to zero: eight, five, four, six, seven, three, million, two,
Nearest to other: various, some, vespers, copyrighted, these, those, tether, different,
Nearest to world: u, experience, catechism, postwar, relieve, lydian, misanthropy, goodyear,
Nearest to during: after, in, since, before, throughout, within, despite, until,
Average loss at step 42000: 3.208765
Average loss at step 44000: 3.141568
Average loss at step 46000: 3.142206
Average loss at step 48000: 3.058366
Average loss at step 50000: 3.058806
Nearest to first: last, second, next, same, third, original, best, largest,
Nearest to nine: eight, seven, six, five, zero, four, th, august,
Nearest to all: several, many, some, various, multiple, these, certain, any,
Nearest to most: especially, many, more, particularly, less, refracting, discoveries, johansson,
Nearest to or: and, gangster, ewald, than, remedial, uses, yahoo, sherpa,
Nearest to that: which, although, what, because, monatomic, bounding, actually, syrups,
Nearest to use: addition, used, influx, cybernetic, admission, treatment, development, noriega,
Nearest to into: through, from, within, around, across, in, throughout, out,
Nearest to system: systems, program, process, foonly, violate, apostles, electro, region,
Nearest to were: are, was, being, is, have, had, include, be,
Nearest to with: soddy, between, using, utilizes, erase, hypoxic, primes, satanic,
Nearest to th: nd, st, twentieth, nineteenth, worsened, nine, globally, third,
Nearest to zero: eight, five, seven, four, six, two, nine, three,
Nearest to other: others, older, various, queueing, smaller, different, vespers, these,
Nearest to world: catechism, plural, label, country, policeman, romanians, abdication, postwar,
Nearest to during: throughout, until, within, in, after, since, through, following,
Average loss at step 52000: 3.099547
Average loss at step 54000: 3.092599
Average loss at step 56000: 2.922255
Average loss at step 58000: 3.034626
Average loss at step 60000: 3.047308
Nearest to first: last, second, next, third, same, earliest, original, latter,
Nearest to nine: eight, seven, six, five, four, zero, two, november,
Nearest to all: several, many, certain, various, some, every, those, both,
Nearest to most: more, especially, some, highest, particularly, greatest, waff, best,
Nearest to or: and, than, ventricle, yahoo, herders, canonised, repackaged, including,
Nearest to that: which, what, however, bao, where, when, burgh, because,
Nearest to use: cybernetic, addition, achieve, sense, tinctures, falsified, because, influx,
Nearest to into: through, within, throughout, across, around, greaves, from, towards,
Nearest to system: systems, brussel, process, program, apostles, position, adonai, informal,
Nearest to were: are, was, is, include, several, remained, being, pederastic,
Nearest to with: between, nonnegative, including, without, soddy, against, poorest, indie,
Nearest to th: twentieth, nd, st, third, globally, nineteenth, worsened, nine,
Nearest to zero: five, eight, six, seven, three, four, nine, million,
Nearest to other: various, others, queueing, older, those, different, several, these,
Nearest to world: catechism, country, condemned, label, romanians, orchids, ran, ruling,
Nearest to during: before, after, throughout, until, despite, since, over, in,
Average loss at step 62000: 3.018039
Average loss at step 64000: 2.928474
Average loss at step 66000: 2.937294
Average loss at step 68000: 2.963855
Average loss at step 70000: 3.007470
Nearest to first: last, next, second, best, earliest, same, third, fourth,
Nearest to nine: eight, th, six, zero, september, july, december, million,
Nearest to all: many, several, some, certain, various, both, every, phylum,
Nearest to most: more, especially, greatest, extremely, highest, many, less, huffman,
Nearest to or: and, than, ventricle, jewellery, gigahertz, sherpa, including, towards,
Nearest to that: which, where, bounding, monatomic, what, because, repudiate, although,
Nearest to use: practice, because, addition, trojans, outspoken, treatment, play, implementation,
Nearest to into: through, towards, across, within, toward, kearney, in, forward,
Nearest to system: systems, spearheaded, severed, decapitate, foonly, jumbo, pegbox, jermaine,
Nearest to were: are, was, is, been, newly, many, exist, potter,
Nearest to with: between, without, soddy, letterboxed, including, ehr, utilizes, satanic,
Nearest to th: nine, st, twentieth, nd, nineteenth, third, globally, worsened,
Nearest to zero: five, two, six, three, eight, seven, nine, million,
Nearest to other: various, older, mouthpieces, those, these, smaller, fewer, pikemen,
Nearest to world: catechism, orchids, plural, roller, kowloon, domesticated, u, prisoner,
Nearest to during: throughout, in, until, before, after, since, at, despite,
Average loss at step 72000: 2.943993
Average loss at step 74000: 2.870428
Average loss at step 76000: 3.000972
Average loss at step 78000: 3.000503
Average loss at step 80000: 2.843173
Nearest to first: last, next, earliest, same, second, best, original, greatest,
Nearest to nine: eight, six, seven, zero, five, three, september, th,
Nearest to all: many, any, every, several, various, some, none, both,
Nearest to most: more, many, less, some, rossellini, highest, greatest, particularly,
Nearest to or: and, but, though, herders, modifiers, than, wtoo, rss,
Nearest to that: which, this, bounding, although, where, why, because, opacity,
Nearest to use: cybernetic, practice, baptize, treatment, addition, trojans, favor, sense,
Nearest to into: through, across, within, from, out, around, forward, kearney,
Nearest to system: systems, spearheaded, foonly, severed, process, monitor, achievement, version,
Nearest to were: are, was, is, various, be, had, pederastic, newly,
Nearest to with: between, husbandry, dalai, without, letterboxed, mansfield, using, against,
Nearest to th: twentieth, st, nd, nineteenth, third, nine, rd, globally,
Nearest to zero: six, seven, eight, five, nine, four, million, three,
Nearest to other: various, these, both, others, vespers, individual, different, older,
Nearest to world: catechism, orchids, king, domesticated, ries, arnauld, country, kowloon,
Nearest to during: throughout, before, after, since, until, in, through, despite,
Average loss at step 82000: 2.942490
Average loss at step 84000: 2.895405
Average loss at step 86000: 2.918082
Average loss at step 88000: 2.947470
Average loss at step 90000: 2.828935
Nearest to first: last, next, earliest, same, best, second, original, oldest,
Nearest to nine: eight, seven, six, zero, five, th, four, late,
Nearest to all: various, some, every, many, none, several, unattested, any,
Nearest to most: especially, many, some, more, less, particularly, highest, best,
Nearest to or: and, than, while, ewald, without, moraine, tala, aka,
Nearest to that: which, where, bounding, what, this, although, who, actually,
Nearest to use: clover, outspoken, amount, because, implementation, cybernetic, standard, share,
Nearest to into: through, within, across, beyond, out, from, forward, throughout,
Nearest to system: systems, brussel, spearheaded, sector, process, pls, violate, apostles,
Nearest to were: are, was, is, newly, rabid, had, be, include,
Nearest to with: between, without, hypoxic, letterboxed, fingerprinting, involving, when, featuring,
Nearest to th: nd, twentieth, st, nine, next, fourth, nineteenth, third,
Nearest to zero: five, eight, six, four, nine, three, seven, two,
Nearest to other: others, fewer, various, older, different, these, monstrous, soaring,
Nearest to world: catechism, orchids, nordic, domesticated, country, puritan, arnauld, plural,
Nearest to during: throughout, after, in, before, since, at, until, over,
Average loss at step 92000: 2.886946
Average loss at step 94000: 2.881201
Average loss at step 96000: 2.710057
Average loss at step 98000: 2.458667
Average loss at step 100000: 2.709969
Nearest to first: last, next, second, earliest, best, third, greatest, same,
Nearest to nine: eight, seven, five, six, four, zero, three, th,
Nearest to all: every, many, various, several, some, any, always, both,
Nearest to most: particularly, more, especially, some, less, many, highly, increasingly,
Nearest to or: and, than, but, ewald, while, rss, canonised, whereas,
Nearest to that: which, where, although, when, whom, opacity, bounding, monatomic,
Nearest to use: version, usage, clover, practice, influx, treatment, consisted, amount,
Nearest to into: through, across, down, within, beyond, out, towards, from,
Nearest to system: systems, brussel, process, corporation, era, bachelor, concept, spearheaded,
Nearest to were: are, was, include, is, exist, had, have, contain,
Nearest to with: between, through, without, among, amongst, hypoxic, celtics, involving,
Nearest to th: twentieth, nd, nineteenth, fourth, rd, st, tenth, seventh,
Nearest to zero: five, eight, four, seven, six, three, two, nine,
Nearest to other: fewer, others, both, older, these, various, smaller, monstrous,
Nearest to world: catechism, orchids, u, domesticated, falklands, ephesus, cartographer, nordic,
Nearest to during: throughout, despite, in, until, since, at, after, within,