-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
The implementation of logits_to_logprobs
should be 1) fast and 2) equivalent to llama.cpp
's implementation.
Actual Behavior
The implementation is not fast nor equivalent to llama.cpp
's implementation.
Context
For a project that I'm wanting to do, I need to compute the logprobs for all prompt tokens, and I need to compute them quickly. For that reason, I looked into using the Bloom model.
Bloom has a much larger vocabulary size (~250k vs ~30k iirc).
This difference meant that the existing code to convert logits to logprobs (below) ran very slowly since it wasn't using numpy.
Original pure-Python implementation
class Llama:
@staticmethod
def logits_to_logprobs(logits: List[float]) -> List[float]:
exps = [math.exp(float(x)) for x in logits]
sum_exps = sum(exps)
return [math.log(x / sum_exps) for x in exps]
I converted this code into numpy (below) but I started getting warnings about infinities.
Original Numpy implementation (produces NaNs)
class Llama:
def logits_to_logprobs(logits: list[float] | np.ndarray) -> np.ndarray:
exps = np.exp(logits, dtype=np.single)
sum_exps = exps.sum()
np.divide(exps, sum_exps, out=exps)
np.exp(exps, out=exps)
return exps
This prompted me to look into what llama.cpp
did (relevant code below):
llama.cpp C++ implementation
struct llama_logit_info {
struct sum_exp {
float max_l;
float operator()(float sum, float l) const { return sum + std::exp(l - max_l); }
};
llama_logit_info(llama_context * ctx)
: logits(llama_get_logits(ctx))
, n_vocab(llama_n_vocab(llama_get_model(ctx)))
, max_l(*std::max_element(logits, logits + n_vocab))
, normalizer(1.0f / std::accumulate(logits, logits + n_vocab, 0.0f, sum_exp{max_l}))
{ }
float probability_from_logit(float logit) const {
return normalizer * std::exp(logit - max_l);
}
};
Translated into Python, this is equivalent to:
llama.cpp pure-Python implementation
class Llama:
def logits_to_logprobs(logits: list[float]) -> list[float]
from math import exp, log
maximum = max(logits)
normalizer = 1.0 / sum(exp(logit - maximum))
probs = [normalizer * exp(logit - maximum)]
logprobs = [log(prob) for prob in probs]
I've spent a fair amount of time trying to work out what the difference between these functions is. They result in roughly the same values (differing at the 6th decimal place for the ones I've directly looked into), but mathematically, they appear very different to me.
Regardless, they both produce output that is very similar (detailed in a later section).
I've also translated this to numpy. This next function is my primary contribution here.
class Llama:
def logits_to_logprobs(logits: np.ndarray) -> np.ndarray:
maximum = np.max(logits)
tmp = np.subtract(logits, maximum)
np.exp(tmp, out=tmp)
normalizer = 1.0 / np.sum(tmp)
np.multiply(normalizer, tmp, out=tmp)
np.log(tmp, out=tmp)
return tmp
llama.cpp straightforward Numpy implementation (slower than previous by factor of 2x)
class Llama:
def logits_to_logprobs(logits: np.ndarray) -> np.ndarray:
maximum = np.max(logits)
normalizer = 1.0 / np.sum(np.exp(logits - maximum))
probs = normalizer * np.exp(logits - maximum)
logprobs = np.log(probs)
return logprobs
Benchmark
I ran this code:
Benchmark Script
#!/usr/bin/env -S ./venv/bin/python3
import llama_cpp, numpy as np
llama = llama_cpp.Llama(
model_path="{{ model }}",
n_gpu_layers=1000,
n_ctx=1024,
n_batch=16,
logits_all=True,
verbose=True,
)
print(f'{llama = }')
prompt = \
('A large language model (LLM) is a type of language model notable for its '
'ability to achieve general-purpose language understanding and generation. '
'LLMs acquire these abilities by using massive amounts of data to learn '
'billions of parameters during training and consuming large computational '
'resources during their training and operation. LLMs are artificial neural '
'networks (mainly transformers) and are (pre-)trained using self-supervised '
'learning and semi-supervised learning.\n'
'As autoregressive language models, they work by taking an input text and '
'repeatedly predicting the next token or word. Up to 2020, fine tuning was '
'the only way a model could be adapted to be able to accomplish specific '
'tasks. Larger sized models, such as GPT-3, however, can be prompt-engineered '
'to achieve similar results. They are thought to acquire knowledge about '
'syntax, semantics and "ontology" inherent in human language corpora, but '
'also inaccuracies and biases present in the corpora.Notable examples include '
"OpenAI's GPT models (e.g., GPT-3.5 and GPT-4, used in ChatGPT), Google's "
"PaLM (used in Bard), and Meta's LLaMa, as well as BLOOM, Ernie 3.0 Titan, "
"and Anthropic's Claude 2.\n"
'\n'
'\n'
'== Dataset preprocessing ==\n'
'\n'
'\n'
'=== Probabilistic tokenization ===\n'
'Using a modification of byte-pair encoding, in the first step, all unique '
'characters (including blanks and punctuation marks) are treated as an '
'initial set of n-grams (i.e. initial set of uni-grams). Successively the '
'most frequent pair of adjacent characters is merged into a bi-gram and all '
'instances of the pair are replaced by it. All occurrences of adjacent pairs '
'of (previously merged) n-grams that most frequently occur together are then '
'again merged into even lengthier n-gram repeatedly until a vocabulary of '
'prescribed size is obtained (in case of GPT-3, the size is 50257). Token '
'vocabulary consists of integers, spanning from zero up to the size of the '
'token vocabulary. New ')
prompt = prompt.encode('utf-8')
print(f'{prompt = }')
token_ids = llama.tokenize(prompt)
print(f'{token_ids = }')
llama.eval(token_ids)
print(f'{llama.scores = }')
print(f'{llama.scores.shape = }')
def a_logits_to_logprobs(logits: list[float]) -> list[float]:
import math
exps = [math.exp(float(x)) for x in logits]
sum_exps = sum(exps)
return [math.log(x / sum_exps) for x in exps]
def b_logits_to_logprobs(scores: list[float] | np.ndarray) -> list[float] | np.ndarray:
exps = np.exp(scores)
sum_exps = exps.sum()
return np.log(np.divide(exps, sum_exps))
def c_logits_to_logprobs(logits: list[float]) -> list[float]:
from math import exp, log
maximum = max(logits)
normalizer = 1.0 / sum(exp(logit - maximum) for logit in logits)
probs = [normalizer * exp(logit - maximum) for logit in logits]
logprobs = [log(prob) for prob in probs]
return logprobs
def d_logits_to_logprobs(logits: np.ndarray) -> np.ndarray:
maximum = np.max(logits)
normalizer = 1.0 / np.sum(np.exp(logits - maximum))
probs = normalizer * np.exp(logits - maximum)
logprobs = np.log(probs)
return logprobs
def e_logits_to_logprobs(logits: np.ndarray) -> np.ndarray:
maximum = np.max(logits)
tmp = np.subtract(logits, maximum)
np.exp(tmp, out=tmp)
normalizer = 1.0 / np.sum(tmp)
np.multiply(normalizer, tmp, out=tmp)
np.log(tmp, out=tmp)
return tmp
a_duration = 0
b_duration = 0
c_duration = 0
d_duration = 0
e_duration = 0
the_scores = []
the_logprobs = []
the_tokens = []
for i, token_id in enumerate(token_ids[1:]):
scores = llama.scores[i, :]
the_scores.append(scores[token_id])
import time
duration = time.time()
for _ in range(1):
logprobs = a_logits_to_logprobs(scores)
duration = time.time() - duration
a_duration += duration
print(f'a: {logprobs[token_id]:10.6f}', end=' ')
duration = time.time()
for _ in range(1):
logprobs = b_logits_to_logprobs(scores)
duration = time.time() - duration
b_duration += duration
print(f'b: {logprobs[token_id]:10.6f}', end=' ')
duration = time.time()
for _ in range(1):
logprobs = c_logits_to_logprobs(scores)
duration = time.time() - duration
c_duration += duration
print(f'c: {logprobs[token_id]:10.6f}', end=' ')
duration = time.time()
for _ in range(1):
logprobs = d_logits_to_logprobs(scores)
duration = time.time() - duration
d_duration += duration
print(f'd: {logprobs[token_id]:10.6f}', end=' ')
duration = time.time()
for _ in range(1):
logprobs = e_logits_to_logprobs(scores)
duration = time.time() - duration
e_duration += duration
print(f'e: {logprobs[token_id]:10.6f}', end=' ')
the_logprobs.append(-logprobs[token_id])
token = llama.detokenize([token_id])
print(token)
the_tokens.append(token)
print(f'{the_logprobs = }')
print(f'{the_scores = }')
print(f'{the_tokens = }')
print(f'{a_duration = }')
print(f'{b_duration = }')
print(f'{c_duration = }')
print(f'{d_duration = }')
print(f'{e_duration = }')
The relevant lines at the end show the performance differences. These are the cumulative time spent by the different functions (labeled a, b, c, d, e) during the entire prompt.
- a: Original pure-Python implementation
- b: Original straightforward numpy implementation
- c: llama.cpp pure-Python implementation
- d: llama.cpp straightforward numpy implementation
- e: llama.cpp optimized numpy implementation
a_duration = 45.63701820373535
b_duration = 1.6752548217773438
c_duration = 67.41351199150085
d_duration = 3.273803234100342
e_duration = 1.6066546440124512
The complete output shows how the results between all relevant (a, c, d, e, not b) functions are consistent.
Complete benchmarking output
llama = <llama_cpp.llama.Llama object at 0x7fd7edb3fc40>
prompt = b'A large language model (LLM) is a type of language model notable for its ability to achieve general-purpose language understanding and generation. LLMs acquire these abilities by using massive amounts of data to learn billions of parameters during training and consuming large computational resources during their training and operation. LLMs are artificial neural networks (mainly transformers) and are (pre-)trained using self-supervised learning and semi-supervised learning.\nAs autoregressive language models, they work by taking an input text and repeatedly predicting the next token or word. Up to 2020, fine tuning was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as GPT-3, however, can be prompt-engineered to achieve similar results. They are thought to acquire knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and biases present in the corpora.Notable examples include OpenAI\'s GPT models (e.g., GPT-3.5 and GPT-4, used in ChatGPT), Google\'s PaLM (used in Bard), and Meta\'s LLaMa, as well as BLOOM, Ernie 3.0 Titan, and Anthropic\'s Claude 2.\n\n\n== Dataset preprocessing ==\n\n\n=== Probabilistic tokenization ===\nUsing a modification of byte-pair encoding, in the first step, all unique characters (including blanks and punctuation marks) are treated as an initial set of n-grams (i.e. initial set of uni-grams). Successively the most frequent pair of adjacent characters is merged into a bi-gram and all instances of the pair are replaced by it. All occurrences of adjacent pairs of (previously merged) n-grams that most frequently occur together are then again merged into even lengthier n-gram repeatedly until a vocabulary of prescribed size is obtained (in case of GPT-3, the size is 50257). Token vocabulary consists of integers, spanning from zero up to the size of the token vocabulary. New '
token_ids = [1, 36, 10021, 16340, 5550, 375, 17368, 48, 12, 632, 267, 4105, 461, 16340, 5550, 52730, 613, 3776, 34447, 427, 28628, 5827, 5088, 248879, 16340, 32391, 530, 44677, 17, 67149, 23099, 117402, 4657, 147910, 1331, 3936, 60157, 73169, 461, 3030, 427, 20296, 238154, 461, 19434, 9411, 24078, 530, 168482, 10021, 129262, 20593, 9411, 3808, 24078, 530, 22511, 17, 67149, 23099, 1306, 48763, 87958, 49653, 375, 11351, 999, 9521, 525, 12, 530, 1306, 375, 8602, 16, 12, 454, 61873, 3936, 3676, 3711, 170548, 17930, 26002, 530, 74584, 3711, 170548, 17930, 26002, 17, 189, 3700, 5787, 1047, 2731, 1068, 16340, 20038, 15, 3291, 2909, 1331, 20956, 660, 9437, 5484, 530, 116466, 215947, 368, 9585, 31346, 791, 14679, 17, 19763, 427, 3566, 15, 15977, 185930, 1620, 368, 3804, 4676, 267, 5550, 4984, 722, 94433, 427, 722, 11045, 427, 83752, 11029, 45109, 17, 499, 25566, 191956, 20038, 15, 5067, 661, 602, 12592, 7265, 15, 14789, 15, 1400, 722, 39841, 16, 24664, 44144, 427, 28628, 9728, 9649, 17, 12941, 1306, 15776, 427, 117402, 25206, 3638, 38717, 15, 182847, 530, 567, 857, 8007, 5, 150085, 361, 7384, 16340, 198517, 15, 1965, 3466, 216509, 142506, 530, 12008, 262, 3344, 361, 368, 198517, 17, 4309, 5138, 34047, 13756, 17226, 19540, 1256, 602, 12592, 20038, 375, 72, 17, 74, 4604, 602, 12592, 7265, 17, 24, 530, 602, 12592, 9571, 15, 4853, 361, 92323, 42, 12592, 1013, 8943, 1256, 5590, 101577, 375, 13762, 361, 141743, 1013, 530, 74696, 1256, 499, 1980, 20620, 15, 661, 6355, 661, 490, 8365, 4387, 15, 40144, 641, 735, 17, 19, 110257, 15, 530, 227639, 322, 1256, 82095, 415, 336, 603, 1872, 524, 87947, 1165, 121392, 3535, 16783, 53048, 75048, 3236, 16741, 31346, 6926, 18066, 189, 39312, 267, 49582, 461, 28880, 16, 37769, 39569, 15, 361, 368, 3968, 17887, 15, 1728, 19826, 26702, 375, 80845, 2949, 11081, 530, 5755, 217214, 68945, 12, 1306, 42566, 661, 660, 11824, 1907, 461, 294, 16, 2789, 86, 375, 76, 17, 72, 17, 11824, 1907, 461, 140576, 16, 2789, 86, 1216, 105900, 13248, 368, 6084, 46730, 28629, 461, 70711, 26702, 632, 117287, 3727, 267, 5444, 16, 2789, 530, 1728, 41229, 461, 368, 28629, 1306, 48430, 1331, 718, 17, 8913, 178410, 461, 70711, 47395, 461, 375, 8602, 86599, 117287, 12, 294, 16, 2789, 86, 861, 6084, 57417, 25078, 15564, 1306, 3816, 5734, 117287, 3727, 6582, 13728, 1058, 294, 16, 2789, 116466, 14779, 267, 195197, 461, 156601, 7869, 632, 23003, 375, 265, 4462, 461, 602, 12592, 7265, 15, 368, 7869, 632, 115486, 7021, 1216, 100169, 195197, 52777, 461, 76944, 15, 1999, 24633, 1485, 18468, 2256, 427, 368, 7869, 461, 368, 31346, 195197, 17, 5161, 210]
llama.scores = array([[ -1.9024081 , 5.6636763 , 9.934613 , ..., -0.9375849 ,
-0.8766303 , -0.87667465],
[317.3427 , 323.53552 , 328.9913 , ..., 189.33157 ,
188.86066 , 188.85739 ],
[406.1632 , 407.21497 , 418.58017 , ..., 216.06926 ,
215.58969 , 215.59052 ],
...,
[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ],
[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ],
[ 0. , 0. , 0. , ..., 0. ,
0. , 0. ]], dtype=float32)
llama.scores.shape = (1024, 250880)
a: -5.367943 b: -5.367944 c: -5.367943 d: -5.367943 e: -5.367943 b'A'
/tmp/justUNIbvV/bar:197: RuntimeWarning: overflow encountered in exp
exps = np.exp(scores, dtype=np.single)
/tmp/justUNIbvV/bar:199: RuntimeWarning: invalid value encountered in divide
return np.log(np.divide(exps, sum_exps))
/tmp/justUNIbvV/bar:199: RuntimeWarning: divide by zero encountered in log
return np.log(np.divide(exps, sum_exps))
/tmp/justUNIbvV/bar:213: RuntimeWarning: divide by zero encountered in log
logprobs = np.log(probs)
/tmp/justUNIbvV/bar:222: RuntimeWarning: divide by zero encountered in log
np.log(tmp, out=tmp)
a: -9.211274 b: nan c: -9.211274 d: -9.211274 e: -9.211274 b' large'
a: -8.448094 b: nan c: -8.448094 d: -8.448094 e: -8.448094 b' language'
a: -4.773893 b: nan c: -4.773893 d: -4.773893 e: -4.773893 b' model'
a: -3.454623 b: nan c: -3.454623 d: -3.454623 e: -3.454623 b' ('
a: -6.340484 b: nan c: -6.340484 d: -6.340484 e: -6.340484 b'LL'
a: -0.023981 b: nan c: -0.023981 d: -0.023982 e: -0.023982 b'M'
a: -0.049261 b: nan c: -0.049261 d: -0.049261 e: -0.049261 b')'
a: -1.234106 b: nan c: -1.234106 d: -1.234106 e: -1.234106 b' is'
/opt/LLMs/bloom-560m/venv/lib/python3.10/site-packages/numpy/core/_methods.py:49: RuntimeWarning: overflow encountered in reduce
return umr_sum(a, axis, dtype, out, keepdims, initial, where)
a: -3.398765 b: nan c: -3.398765 d: -3.398765 e: -3.398765 b' a'
a: -4.051744 b: nan c: -4.051744 d: -4.051744 e: -4.051744 b' type'
a: -0.436684 b: nan c: -0.436684 d: -0.436684 e: -0.436684 b' of'
a: -2.718504 b: nan c: -2.718504 d: -2.718504 e: -2.718504 b' language'
a: -0.878869 b: nan c: -0.878869 d: -0.878870 e: -0.878870 b' model'
a: -13.767629 b: nan c: -13.767629 d: -13.767629 e: -13.767629 b' notable'
a: -0.414839 b: nan c: -0.414839 d: -0.414839 e: -0.414839 b' for'
a: -0.711662 b: nan c: -0.711662 d: -0.711662 e: -0.711662 b' its'
a: -3.055207 b: nan c: -3.055207 d: -3.055207 e: -3.055207 b' ability'
a: -0.119380 b: nan c: -0.119380 d: -0.119381 e: -0.119381 b' to'
a: -6.023344 b: nan c: -6.023344 d: -6.023344 e: -6.023344 b' achieve'
a: -6.019740 b: nan c: -6.019740 d: -6.019741 e: -6.019741 b' general'
a: -2.655617 b: nan c: -2.655617 d: -2.655617 e: -2.655617 b'-p'
a: -0.003393 b: nan c: -0.003393 d: -0.003393 e: -0.003393 b'urpose'
a: -2.892472 b: nan c: -2.892472 d: -2.892472 e: -2.892472 b' language'
a: -1.436164 b: nan c: -1.436164 d: -1.436164 e: -1.436164 b' understanding'
a: -3.359024 b: nan c: -3.359024 d: -3.359024 e: -3.359024 b' and'
a: -6.046928 b: nan c: -6.046928 d: -6.046928 e: -6.046928 b' generation'
a: -0.778185 b: nan c: -0.778185 d: -0.778185 e: -0.778185 b'.'
a: -4.106206 b: nan c: -4.106206 d: -4.106205 e: -4.106205 b' LL'
a: -0.236571 b: nan c: -0.236571 d: -0.236571 e: -0.236571 b'Ms'
a: -10.396901 b: nan c: -10.396901 d: -10.396900 e: -10.396900 b' acquire'
a: -4.239930 b: nan c: -4.239930 d: -4.239930 e: -4.239930 b' these'
a: -2.468247 b: nan c: -2.468247 d: -2.468247 e: -2.468247 b' abilities'
a: -0.911015 b: nan c: -0.911015 d: -0.911015 e: -0.911015 b' by'
a: -3.224710 b: nan c: -3.224710 d: -3.224710 e: -3.224710 b' using'
a: -7.119808 b: nan c: -7.119808 d: -7.119808 e: -7.119808 b' massive'
a: -1.583508 b: nan c: -1.583508 d: -1.583508 e: -1.583508 b' amounts'
a: -0.007975 b: nan c: -0.007975 d: -0.007975 e: -0.007975 b' of'
a: -0.823093 b: nan c: -0.823093 d: -0.823094 e: -0.823094 b' data'
a: -4.468983 b: nan c: -4.468983 d: -4.468983 e: -4.468983 b' to'
a: -2.854365 b: nan c: -2.854365 d: -2.854366 e: -2.854366 b' learn'
a: -9.325324 b: nan c: -9.325324 d: -9.325323 e: -9.325323 b' billions'
a: -0.084403 b: nan c: -0.084403 d: -0.084403 e: -0.084403 b' of'
a: -5.722558 b: nan c: -5.722558 d: -5.722558 e: -5.722558 b' parameters'
a: -6.927736 b: nan c: -6.927736 d: -6.927736 e: -6.927736 b' during'
a: -1.155214 b: nan c: -1.155214 d: -1.155214 e: -1.155214 b' training'
a: -4.249696 b: nan c: -4.249696 d: -4.249696 e: -4.249696 b' and'
a: -12.140764 b: nan c: -12.140764 d: -12.140764 e: -12.140764 b' consuming'
a: -3.625996 b: nan c: -3.625996 d: -3.625996 e: -3.625996 b' large'
a: -5.874669 b: nan c: -5.874669 d: -5.874670 e: -5.874670 b' computational'
a: -0.042336 b: nan c: -0.042336 d: -0.042335 e: -0.042335 b' resources'
a: -2.813871 b: nan c: -2.813871 d: -2.813871 e: -2.813871 b' during'
a: -7.389079 b: nan c: -7.389079 d: -7.389079 e: -7.389079 b' their'
a: -2.935715 b: nan c: -2.935715 d: -2.935714 e: -2.935714 b' training'
a: -3.790351 b: nan c: -3.790351 d: -3.790351 e: -3.790351 b' and'
a: -8.418568 b: nan c: -8.418568 d: -8.418568 e: -8.418568 b' operation'
a: -0.887175 b: nan c: -0.887175 d: -0.887175 e: -0.887175 b'.'
a: -3.584403 b: nan c: -3.584403 d: -3.584403 e: -3.584403 b' LL'
a: -0.153291 b: nan c: -0.153291 d: -0.153292 e: -0.153292 b'Ms'
a: -3.057801 b: nan c: -3.057801 d: -3.057801 e: -3.057801 b' are'
a: -9.531555 b: nan c: -9.531555 d: -9.531554 e: -9.531554 b' artificial'
a: -1.795445 b: nan c: -1.795445 d: -1.795445 e: -1.795445 b' neural'
a: -1.873023 b: nan c: -1.873023 d: -1.873023 e: -1.873023 b' networks'
a: -2.981514 b: nan c: -2.981514 d: -2.981514 e: -2.981514 b' ('
a: -12.873079 b: nan c: -12.873079 d: -12.873079 e: -12.873079 b'main'
a: -0.006427 b: nan c: -0.006427 d: -0.006426 e: -0.006426 b'ly'
a: -10.676861 b: nan c: -10.676861 d: -10.676861 e: -10.676861 b' transform'
a: -1.968519 b: nan c: -1.968519 d: -1.968520 e: -1.968520 b'ers'
a: -0.109352 b: nan c: -0.109352 d: -0.109352 e: -0.109352 b')'
a: -4.776477 b: nan c: -4.776477 d: -4.776477 e: -4.776477 b' and'
a: -3.597317 b: nan c: -3.597317 d: -3.597317 e: -3.597317 b' are'
a: -9.188688 b: nan c: -9.188688 d: -9.188688 e: -9.188688 b' ('
a: -7.147460 b: nan c: -7.147460 d: -7.147460 e: -7.147460 b'pre'
a: -1.609808 b: nan c: -1.609808 d: -1.609808 e: -1.609808 b'-'
a: -0.074441 b: nan c: -0.074441 d: -0.074440 e: -0.074440 b')'
a: -4.861610 b: nan c: -4.861610 d: -4.861610 e: -4.861610 b'tr'
a: -0.030640 b: nan c: -0.030640 d: -0.030640 e: -0.030640 b'ained'
a: -2.436419 b: nan c: -2.436419 d: -2.436419 e: -2.436419 b' using'
a: -11.050475 b: nan c: -11.050475 d: -11.050475 e: -11.050475 b' self'
a: -6.723314 b: nan c: -6.723314 d: -6.723314 e: -6.723314 b'-s'
a: -1.884679 b: nan c: -1.884679 d: -1.884679 e: -1.884679 b'uperv'
a: -0.209823 b: nan c: -0.209823 d: -0.209823 e: -0.209823 b'ised'
a: -0.344791 b: nan c: -0.344791 d: -0.344791 e: -0.344791 b' learning'
a: -5.031610 b: nan c: -5.031610 d: -5.031610 e: -5.031610 b' and'
a: -5.803086 b: nan c: -5.803086 d: -5.803085 e: -5.803085 b' semi'
a: -0.004210 b: nan c: -0.004210 d: -0.004210 e: -0.004210 b'-s'
a: -0.000367 b: nan c: -0.000367 d: -0.000367 e: -0.000367 b'uperv'
a: -0.004105 b: nan c: -0.004105 d: -0.004105 e: -0.004105 b'ised'
a: -0.083150 b: nan c: -0.083150 d: -0.083149 e: -0.083149 b' learning'
a: -0.316670 b: nan c: -0.316670 d: -0.316670 e: -0.316670 b'.'
a: -8.383088 b: nan c: -8.383088 d: -8.383087 e: -8.383087 b'\n'
a: -4.904536 b: nan c: -4.904536 d: -4.904536 e: -4.904536 b'As'
a: -10.636754 b: nan c: -10.636754 d: -10.636753 e: -10.636753 b' autor'
a: -0.114179 b: nan c: -0.114179 d: -0.114179 e: -0.114179 b'eg'
a: -0.384609 b: nan c: -0.384609 d: -0.384609 e: -0.384609 b'ress'
a: -0.024401 b: nan c: -0.024401 d: -0.024400 e: -0.024400 b'ive'
a: -6.012336 b: nan c: -6.012336 d: -6.012336 e: -6.012336 b' language'
a: -0.104443 b: nan c: -0.104443 d: -0.104443 e: -0.104443 b' models'
a: -3.506413 b: nan c: -3.506413 d: -3.506413 e: -3.506413 b','
a: -2.394819 b: nan c: -2.394819 d: -2.394819 e: -2.394819 b' they'
a: -5.698679 b: nan c: -5.698679 d: -5.698679 e: -5.698679 b' work'
a: -2.301849 b: nan c: -2.301849 d: -2.301849 e: -2.301849 b' by'
a: -3.550005 b: nan c: -3.550005 d: -3.550005 e: -3.550005 b' taking'
a: -3.942695 b: nan c: -3.942695 d: -3.942695 e: -3.942695 b' an'
a: -0.378508 b: nan c: -0.378508 d: -0.378508 e: -0.378508 b' input'
a: -1.609821 b: nan c: -1.609821 d: -1.609821 e: -1.609821 b' text'
a: -2.052005 b: nan c: -2.052005 d: -2.052005 e: -2.052005 b' and'
a: -9.787199 b: nan c: -9.787199 d: -9.787198 e: -9.787198 b' repeatedly'
a: -3.044364 b: nan c: -3.044364 d: -3.044364 e: -3.044364 b' predicting'
a: -2.154076 b: nan c: -2.154076 d: -2.154076 e: -2.154076 b' the'
a: -1.368177 b: nan c: -1.368177 d: -1.368177 e: -1.368177 b' next'
a: -3.798854 b: nan c: -3.798854 d: -3.798854 e: -3.798854 b' token'
a: -3.504242 b: nan c: -3.504242 d: -3.504242 e: -3.504242 b' or'
a: -1.951720 b: nan c: -1.951720 d: -1.951720 e: -1.951720 b' word'
a: -1.467691 b: nan c: -1.467691 d: -1.467691 e: -1.467691 b'.'
a: -8.371155 b: nan c: -8.371155 d: -8.371155 e: -8.371155 b' Up'
a: -1.930106 b: nan c: -1.930106 d: -1.930105 e: -1.930105 b' to'
a: -11.264959 b: nan c: -11.264959 d: -11.264958 e: -11.264958 b' 2020'
a: -1.725528 b: nan c: -1.725528 d: -1.725528 e: -1.725528 b','
a: -9.192338 b: nan c: -9.192338 d: -9.192338 e: -9.192338 b' fine'
a: -6.510423 b: nan c: -6.510423 d: -6.510423 e: -6.510423 b' tuning'
a: -3.739128 b: nan c: -3.739128 d: -3.739128 e: -3.739128 b' was'
a: -2.195986 b: nan c: -2.195986 d: -2.195985 e: -2.195985 b' the'
a: -2.049318 b: nan c: -2.049318 d: -2.049318 e: -2.049318 b' only'
a: -1.418543 b: nan c: -1.418543 d: -1.418543 e: -1.418543 b' way'
a: -7.099220 b: nan c: -7.099220 d: -7.099220 e: -7.099220 b' a'
a: -3.027331 b: nan c: -3.027331 d: -3.027331 e: -3.027331 b' model'
a: -0.354169 b: nan c: -0.354169 d: -0.354169 e: -0.354169 b' could'
a: -3.314866 b: nan c: -3.314866 d: -3.314866 e: -3.314866 b' be'
a: -5.803183 b: nan c: -5.803183 d: -5.803182 e: -5.803182 b' adapted'
a: -1.205845 b: nan c: -1.205845 d: -1.205845 e: -1.205845 b' to'
a: -5.716505 b: nan c: -5.716505 d: -5.716505 e: -5.716505 b' be'
a: -2.568644 b: nan c: -2.568644 d: -2.568644 e: -2.568644 b' able'
a: -0.053252 b: nan c: -0.053252 d: -0.053251 e: -0.053251 b' to'
a: -6.832457 b: nan c: -6.832457 d: -6.832457 e: -6.832457 b' accomplish'
a: -6.551181 b: nan c: -6.551181 d: -6.551180 e: -6.551180 b' specific'
a: -0.138228 b: nan c: -0.138228 d: -0.138228 e: -0.138228 b' tasks'
a: -0.184915 b: nan c: -0.184915 d: -0.184915 e: -0.184915 b'.'
a: -6.777157 b: nan c: -6.777157 d: -6.777157 e: -6.777157 b' L'
a: -1.435120 b: nan c: -1.435120 d: -1.435120 e: -1.435120 b'arger'
a: -7.833315 b: nan c: -7.833315 d: -7.833315 e: -7.833315 b' sized'
a: -1.126560 b: nan c: -1.126560 d: -1.126560 e: -1.126560 b' models'
a: -4.893678 b: nan c: -4.893678 d: -4.893677 e: -4.893677 b','
a: -1.398410 b: nan c: -1.398410 d: -1.398409 e: -1.398409 b' such'
a: -0.011862 b: nan c: -0.011862 d: -0.011862 e: -0.011862 b' as'
a: -4.611034 b: nan c: -4.611034 d: -4.611034 e: -4.611034 b' G'
a: -6.461701 b: nan c: -6.461701 d: -6.461701 e: -6.461701 b'PT'
a: -5.873858 b: nan c: -5.873858 d: -5.873858 e: -5.873858 b'-3'
a: -4.473673 b: nan c: -4.473673 d: -4.473673 e: -4.473673 b','
a: -7.415085 b: nan c: -7.415085 d: -7.415085 e: -7.415085 b' however'
a: -0.660664 b: nan c: -0.660664 d: -0.660664 e: -0.660664 b','
a: -3.812538 b: nan c: -3.812538 d: -3.812538 e: -3.812538 b' can'
a: -3.049508 b: nan c: -3.049508 d: -3.049508 e: -3.049508 b' be'
a: -13.477192 b: nan c: -13.477192 d: -13.477192 e: -13.477192 b' prompt'
a: -9.535336 b: nan c: -9.535336 d: -9.535336 e: -9.535336 b'-'
a: -8.362512 b: nan c: -8.362512 d: -8.362512 e: -8.362512 b'engine'
a: -0.023734 b: nan c: -0.023734 d: -0.023734 e: -0.023734 b'ered'
a: -1.997923 b: nan c: -1.997923 d: -1.997923 e: -1.997923 b' to'
a: -3.107490 b: nan c: -3.107490 d: -3.107490 e: -3.107490 b' achieve'
a: -4.664283 b: nan c: -4.664283 d: -4.664283 e: -4.664283 b' similar'
a: -2.499724 b: nan c: -2.499724 d: -2.499724 e: -2.499724 b' results'
a: -0.827204 b: nan c: -0.827204 d: -0.827205 e: -0.827205 b'.'
a: -3.769433 b: nan c: -3.769433 d: -3.769433 e: -3.769433 b' They'
a: -3.203774 b: nan c: -3.203774 d: -3.203774 e: -3.203774 b' are'
a: -8.417553 b: nan c: -8.417553 d: -8.417553 e: -8.417553 b' thought'
a: -0.521153 b: nan c: -0.521153 d: -0.521153 e: -0.521153 b' to'
a: -8.454661 b: nan c: -8.454661 d: -8.454661 e: -8.454661 b' acquire'
a: -4.176726 b: nan c: -4.176726 d: -4.176726 e: -4.176726 b' knowledge'
a: -0.663222 b: nan c: -0.663222 d: -0.663222 e: -0.663222 b' about'
a: -7.855154 b: nan c: -7.855154 d: -7.855154 e: -7.855154 b' syntax'
a: -4.059312 b: nan c: -4.059312 d: -4.059312 e: -4.059312 b','
a: -1.056918 b: nan c: -1.056918 d: -1.056918 e: -1.056918 b' semantics'
a: -1.699821 b: nan c: -1.699821 d: -1.699821 e: -1.699821 b' and'
a: -9.854226 b: nan c: -9.854226 d: -9.854226 e: -9.854226 b' "'
a: -9.568123 b: nan c: -9.568123 d: -9.568123 e: -9.568123 b'ont'
a: -0.618908 b: nan c: -0.618908 d: -0.618908 e: -0.618908 b'ology'
a: -0.044653 b: nan c: -0.044653 d: -0.044653 e: -0.044653 b'"'
a: -9.352193 b: nan c: -9.352193 d: -9.352193 e: -9.352193 b' inherent'
a: -0.316627 b: nan c: -0.316627 d: -0.316627 e: -0.316627 b' in'
a: -6.039067 b: nan c: -6.039067 d: -6.039067 e: -6.039067 b' human'
a: -0.600106 b: nan c: -0.600106 d: -0.600106 e: -0.600106 b' language'
a: -8.476008 b: nan c: -8.476008 d: -8.476008 e: -8.476008 b' corpora'
a: -3.708569 b: nan c: -3.708569 d: -3.708569 e: -3.708569 b','
a: -5.688789 b: nan c: -5.688789 d: -5.688789 e: -5.688789 b' but'
a: -4.589791 b: nan c: -4.589791 d: -4.589791 e: -4.589791 b' also'
a: -11.823556 b: nan c: -11.823556 d: -11.823556 e: -11.823556 b' inaccur'
a: -2.486506 b: nan c: -2.486506 d: -2.486506 e: -2.486506 b'acies'
a: -4.074617 b: nan c: -4.074617 d: -4.074617 e: -4.074617 b' and'
a: -6.034516 b: nan c: -6.034516 d: -6.034516 e: -6.034516 b' bias'
a: -0.120722 b: nan c: -0.120722 d: -0.120722 e: -0.120722 b'es'
a: -4.798411 b: nan c: -4.798411 d: -4.798411 e: -4.798411 b' present'
a: -0.238738 b: nan c: -0.238738 d: -0.238738 e: -0.238738 b' in'
a: -3.628764 b: nan c: -3.628764 d: -3.628764 e: -3.628764 b' the'
a: -4.484479 b: nan c: -4.484479 d: -4.484479 e: -4.484479 b' corpora'
a: -0.330885 b: nan c: -0.330885 d: -0.330885 e: -0.330885 b'.'
a: -15.756860 b: nan c: -15.756860 d: -15.756860 e: -15.756860 b'No'
a: -2.463265 b: nan c: -2.463265 d: -2.463265 e: -2.463265 b'table'
a: -1.729138 b: nan c: -1.729138 d: -1.729138 e: -1.729138 b' examples'
a: -1.325971 b: nan c: -1.325971 d: -1.325971 e: -1.325971 b' include'
a: -8.920161 b: nan c: -8.920161 d: -8.920161 e: -8.920161 b' Open'
a: -6.889237 b: nan c: -6.889237 d: -6.889237 e: -6.889237 b'AI'
a: -1.667497 b: nan c: -1.667497 d: -1.667497 e: -1.667497 b"'s"
a: -3.468127 b: nan c: -3.468127 d: -3.468127 e: -3.468127 b' G'
a: -0.155491 b: nan c: -0.155491 d: -0.155490 e: -0.155490 b'PT'
a: -9.349035 b: nan c: -9.349035 d: -9.349035 e: -9.349035 b' models'
a: -2.142191 b: nan c: -2.142191 d: -2.142191 e: -2.142191 b' ('
a: -6.530726 b: nan c: -6.530726 d: -6.530725 e: -6.530725 b'e'
a: -0.038004 b: nan c: -0.038004 d: -0.038004 e: -0.038004 b'.'
a: -0.001891 b: nan c: -0.001891 d: -0.001891 e: -0.001891 b'g'
a: -1.292955 b: nan c: -1.292955 d: -1.292955 e: -1.292955 b'.,'
a: -2.107854 b: nan c: -2.107854 d: -2.107854 e: -2.107854 b' G'
a: -0.065227 b: nan c: -0.065227 d: -0.065227 e: -0.065227 b'PT'
a: -1.508945 b: nan c: -1.508945 d: -1.508945 e: -1.508945 b'-3'
a: -2.830958 b: nan c: -2.830958 d: -2.830958 e: -2.830958 b'.'
a: -2.436518 b: nan c: -2.436518 d: -2.436519 e: -2.436519 b'5'
a: -5.900762 b: nan c: -5.900762 d: -5.900762 e: -5.900762 b' and'
a: -0.304319 b: nan c: -0.304319 d: -0.304319 e: -0.304319 b' G'
a: -0.004185 b: nan c: -0.004185 d: -0.004185 e: -0.004185 b'PT'
a: -1.768496 b: nan c: -1.768496 d: -1.768496 e: -1.768496 b'-4'
a: -8.031993 b: nan c: -8.031993 d: -8.031993 e: -8.031993 b','
a: -6.739377 b: nan c: -6.739377 d: -6.739377 e: -6.739377 b' used'
a: -0.779801 b: nan c: -0.779801 d: -0.779801 e: -0.779801 b' in'
a: -9.579938 b: nan c: -9.579938 d: -9.579938 e: -9.579938 b' Chat'
a: -6.897430 b: nan c: -6.897430 d: -6.897429 e: -6.897429 b'G'
a: -2.746655 b: nan c: -2.746655 d: -2.746655 e: -2.746655 b'PT'
a: -2.714486 b: nan c: -2.714486 d: -2.714486 e: -2.714486 b'),'
a: -4.667047 b: nan c: -4.667047 d: -4.667047 e: -4.667047 b' Google'
a: -0.112059 b: nan c: -0.112059 d: -0.112059 e: -0.112059 b"'s"
a: -7.231744 b: nan c: -7.231744 d: -7.231744 e: -7.231744 b' Pa'
a: -9.501238 b: nan c: -9.501238 d: -9.501238 e: -9.501238 b'LM'
a: -0.429585 b: nan c: -0.429585 d: -0.429585 e: -0.429585 b' ('
a: -5.146717 b: nan c: -5.146717 d: -5.146717 e: -5.146717 b'used'
a: -0.183795 b: nan c: -0.183795 d: -0.183794 e: -0.183794 b' in'
a: -15.040811 b: nan c: -15.040811 d: -15.040811 e: -15.040811 b' Bard'
a: -3.363115 b: nan c: -3.363115 d: -3.363115 e: -3.363115 b'),'
a: -2.558873 b: nan c: -2.558873 d: -2.558873 e: -2.558873 b' and'
a: -4.944914 b: nan c: -4.944914 d: -4.944914 e: -4.944914 b' Meta'
a: -6.735223 b: nan c: -6.735223 d: -6.735223 e: -6.735223 b"'s"
a: -3.014152 b: nan c: -3.014152 d: -3.014152 e: -3.014152 b' L'
a: -9.980461 b: nan c: -9.980461 d: -9.980461 e: -9.980461 b'La'
a: -7.585458 b: nan c: -7.585458 d: -7.585458 e: -7.585458 b'Ma'
a: -6.683825 b: nan c: -6.683825 d: -6.683825 e: -6.683825 b','
a: -7.067109 b: nan c: -7.067109 d: -7.067109 e: -7.067109 b' as'
a: -0.245106 b: nan c: -0.245106 d: -0.245106 e: -0.245106 b' well'
a: -0.020978 b: nan c: -0.020978 d: -0.020978 e: -0.020978 b' as'
a: -4.787077 b: nan c: -4.787077 d: -4.787077 e: -4.787077 b' B'
a: -5.519316 b: nan c: -5.519316 d: -5.519315 e: -5.519315 b'LO'
a: -1.241315 b: nan c: -1.241315 d: -1.241315 e: -1.241315 b'OM'
a: -5.318388 b: nan c: -5.318388 d: -5.318388 e: -5.318388 b','
a: -13.117612 b: nan c: -13.117612 d: -13.117612 e: -13.117612 b' Ern'
a: -1.056907 b: nan c: -1.056907 d: -1.056907 e: -1.056907 b'ie'
a: -7.942470 b: nan c: -7.942470 d: -7.942470 e: -7.942470 b' 3'
a: -2.225304 b: nan c: -2.225304 d: -2.225304 e: -2.225304 b'.'
a: -0.668179 b: nan c: -0.668179 d: -0.668180 e: -0.668180 b'0'
a: -13.902852 b: nan c: -13.902852 d: -13.902852 e: -13.902852 b' Titan'
a: -3.053794 b: nan c: -3.053794 d: -3.053794 e: -3.053794 b','
a: -2.372967 b: nan c: -2.372967 d: -2.372967 e: -2.372967 b' and'
a: -10.522296 b: nan c: -10.522296 d: -10.522296 e: -10.522296 b' Anthrop'
a: -6.443346 b: nan c: -6.443346 d: -6.443346 e: -6.443346 b'ic'
a: -2.064039 b: nan c: -2.064039 d: -2.064039 e: -2.064039 b"'s"
a: -10.992498 b: nan c: -10.992498 d: -10.992498 e: -10.992498 b' Claude'
a: -5.346071 b: nan c: -5.346071 d: -5.346071 e: -5.346071 b' 2'
a: -4.554083 b: nan c: -4.554083 d: -4.554083 e: -4.554083 b'.\n'
a: -13.498063 b: nan c: -13.498063 d: -13.498063 e: -13.498063 b'\n\n'
a: -10.621262 b: nan c: -10.621262 d: -10.621263 e: -10.621263 b'=='
a: -7.883044 b: nan c: -7.883044 d: -7.883044 e: -7.883044 b' D'
a: -2.834494 b: nan c: -2.834494 d: -2.834494 e: -2.834494 b'ataset'
a: -9.421761 b: nan c: -9.421761 d: -9.421762 e: -9.421762 b' pre'
a: -0.827429 b: nan c: -0.827429 d: -0.827429 e: -0.827429 b'processing'
a: -1.165996 b: nan c: -1.165996 d: -1.165996 e: -1.165996 b' =='
a: -9.472489 b: nan c: -9.472489 d: -9.472489 e: -9.472489 b'\n\n\n'
a: -4.742021 b: nan c: -4.742021 d: -4.742021 e: -4.742021 b'==='
a: -9.971311 b: nan c: -9.971311 d: -9.971311 e: -9.971311 b' Prob'
a: -1.162413 b: nan c: -1.162413 d: -1.162413 e: -1.162413 b'abil'
a: -0.011185 b: nan c: -0.011185 d: -0.011185 e: -0.011185 b'istic'
a: -5.476724 b: nan c: -5.476724 d: -5.476725 e: -5.476725 b' token'
a: -0.708107 b: nan c: -0.708107 d: -0.708107 e: -0.708107 b'ization'
a: -1.079870 b: nan c: -1.079870 d: -1.079870 e: -1.079870 b' ==='
a: -0.157445 b: nan c: -0.157445 d: -0.157445 e: -0.157445 b'\n'
a: -5.219115 b: nan c: -5.219115 d: -5.219115 e: -5.219115 b'Using'
a: -3.383572 b: nan c: -3.383572 d: -3.383572 e: -3.383572 b' a'
a: -7.967370 b: nan c: -7.967370 d: -7.967370 e: -7.967370 b' modification'
a: -0.099843 b: nan c: -0.099843 d: -0.099843 e: -0.099843 b' of'
a: -12.049183 b: nan c: -12.049183 d: -12.049184 e: -12.049184 b' byte'
a: -3.746562 b: nan c: -3.746562 d: -3.746562 e: -3.746562 b'-'
a: -3.753868 b: nan c: -3.753868 d: -3.753868 e: -3.753868 b'pair'
a: -3.814237 b: nan c: -3.814237 d: -3.814238 e: -3.814238 b' encoding'
a: -3.529470 b: nan c: -3.529470 d: -3.529470 e: -3.529470 b','
a: -4.875571 b: nan c: -4.875571 d: -4.875571 e: -4.875571 b' in'
a: -4.729910 b: nan c: -4.729910 d: -4.729910 e: -4.729910 b' the'
a: -2.925047 b: nan c: -2.925047 d: -2.925047 e: -2.925047 b' first'
a: -3.066268 b: nan c: -3.066268 d: -3.066268 e: -3.066268 b' step'
a: -4.241026 b: nan c: -4.241026 d: -4.241026 e: -4.241026 b','
a: -3.296739 b: nan c: -3.296739 d: -3.296739 e: -3.296739 b' all'
a: -7.303405 b: nan c: -7.303405 d: -7.303405 e: -7.303405 b' unique'
a: -4.998149 b: nan c: -4.998149 d: -4.998149 e: -4.998149 b' characters'
a: -1.471050 b: nan c: -1.471050 d: -1.471050 e: -1.471050 b' ('
a: -2.434214 b: nan c: -2.434214 d: -2.434214 e: -2.434214 b'including'
a: -7.047697 b: nan c: -7.047697 d: -7.047697 e: -7.047697 b' bl'
a: -0.014783 b: nan c: -0.014783 d: -0.014783 e: -0.014783 b'anks'
a: -3.620189 b: nan c: -3.620189 d: -3.620189 e: -3.620189 b' and'
a: -1.747435 b: nan c: -1.747435 d: -1.747435 e: -1.747435 b' pun'
a: -0.124111 b: nan c: -0.124111 d: -0.124111 e: -0.124111 b'ctuation'
a: -4.570289 b: nan c: -4.570289 d: -4.570289 e: -4.570289 b' marks'
a: -0.006916 b: nan c: -0.006916 d: -0.006916 e: -0.006916 b')'
a: -0.349460 b: nan c: -0.349460 d: -0.349460 e: -0.349460 b' are'
a: -4.207152 b: nan c: -4.207152 d: -4.207152 e: -4.207152 b' treated'
a: -0.524893 b: nan c: -0.524893 d: -0.524893 e: -0.524893 b' as'
a: -2.482481 b: nan c: -2.482481 d: -2.482481 e: -2.482481 b' an'
a: -7.917991 b: nan c: -7.917991 d: -7.917991 e: -7.917991 b' initial'
a: -4.777965 b: nan c: -4.777965 d: -4.777966 e: -4.777966 b' set'
a: -0.964483 b: nan c: -0.964483 d: -0.964483 e: -0.964483 b' of'
a: -8.942179 b: nan c: -8.942179 d: -8.942179 e: -8.942179 b' n'
a: -7.141345 b: nan c: -7.141345 d: -7.141345 e: -7.141345 b'-'
a: -2.807656 b: nan c: -2.807656 d: -2.807656 e: -2.807656 b'gram'
a: -1.493124 b: nan c: -1.493124 d: -1.493124 e: -1.493124 b's'
a: -0.996502 b: nan c: -0.996502 d: -0.996502 e: -0.996502 b' ('
a: -2.220737 b: nan c: -2.220737 d: -2.220737 e: -2.220737 b'i'
a: -0.023608 b: nan c: -0.023608 d: -0.023607 e: -0.023607 b'.'
a: -0.004616 b: nan c: -0.004616 d: -0.004616 e: -0.004616 b'e'
a: -0.402904 b: nan c: -0.402904 d: -0.402903 e: -0.402903 b'.'
a: -6.809383 b: nan c: -6.809383 d: -6.809383 e: -6.809383 b' initial'
a: -5.668299 b: nan c: -5.668299 d: -5.668299 e: -5.668299 b' set'
a: -0.831633 b: nan c: -0.831633 d: -0.831632 e: -0.831632 b' of'
a: -13.157858 b: nan c: -13.157858 d: -13.157859 e: -13.157859 b' uni'
a: -7.490177 b: nan c: -7.490177 d: -7.490177 e: -7.490177 b'-'
a: -0.923227 b: nan c: -0.923227 d: -0.923227 e: -0.923227 b'gram'
a: -0.331740 b: nan c: -0.331740 d: -0.331740 e: -0.331740 b's'
a: -3.534358 b: nan c: -3.534358 d: -3.534358 e: -3.534358 b').'
a: -8.112888 b: nan c: -8.112888 d: -8.112887 e: -8.112887 b' Success'
a: -2.912493 b: nan c: -2.912493 d: -2.912493 e: -2.912493 b'ively'
a: -3.370563 b: nan c: -3.370563 d: -3.370563 e: -3.370563 b' the'
a: -5.828970 b: nan c: -5.828970 d: -5.828970 e: -5.828970 b' most'
a: -0.993331 b: nan c: -0.993331 d: -0.993332 e: -0.993332 b' frequent'
a: -5.206540 b: nan c: -5.206540 d: -5.206540 e: -5.206540 b' pair'
a: -1.452364 b: nan c: -1.452364 d: -1.452364 e: -1.452364 b' of'
a: -5.108283 b: nan c: -5.108283 d: -5.108283 e: -5.108283 b' adjacent'
a: -1.408771 b: nan c: -1.408771 d: -1.408771 e: -1.408771 b' characters'
a: -1.461254 b: nan c: -1.461254 d: -1.461254 e: -1.461254 b' is'
a: -5.858886 b: nan c: -5.858886 d: -5.858886 e: -5.858886 b' merged'
a: -1.780827 b: nan c: -1.780827 d: -1.780827 e: -1.780827 b' into'
a: -1.503654 b: nan c: -1.503654 d: -1.503654 e: -1.503654 b' a'
a: -7.183869 b: nan c: -7.183869 d: -7.183870 e: -7.183870 b' bi'
a: -4.349968 b: nan c: -4.349968 d: -4.349968 e: -4.349968 b'-'
a: -0.234041 b: nan c: -0.234041 d: -0.234041 e: -0.234041 b'gram'
a: -5.012111 b: nan c: -5.012111 d: -5.012111 e: -5.012111 b' and'
a: -3.104003 b: nan c: -3.104003 d: -3.104002 e: -3.104002 b' all'
a: -7.904882 b: nan c: -7.904882 d: -7.904882 e: -7.904882 b' instances'
a: -0.581809 b: nan c: -0.581809 d: -0.581809 e: -0.581809 b' of'
a: -2.792189 b: nan c: -2.792189 d: -2.792189 e: -2.792189 b' the'
a: -3.578452 b: nan c: -3.578452 d: -3.578452 e: -3.578452 b' pair'
a: -1.364585 b: nan c: -1.364585 d: -1.364585 e: -1.364585 b' are'
a: -5.230806 b: nan c: -5.230806 d: -5.230806 e: -5.230806 b' replaced'
a: -1.658925 b: nan c: -1.658925 d: -1.658925 e: -1.658925 b' by'
a: -6.258741 b: nan c: -6.258741 d: -6.258740 e: -6.258740 b' it'
a: -0.139437 b: nan c: -0.139437 d: -0.139437 e: -0.139437 b'.'
a: -4.663624 b: nan c: -4.663624 d: -4.663624 e: -4.663624 b' All'
a: -1.965869 b: nan c: -1.965869 d: -1.965869 e: -1.965869 b' occurrences'
a: -0.203550 b: nan c: -0.203550 d: -0.203550 e: -0.203550 b' of'
a: -4.942329 b: nan c: -4.942329 d: -4.942329 e: -4.942329 b' adjacent'
a: -2.225470 b: nan c: -2.225470 d: -2.225470 e: -2.225470 b' pairs'
a: -1.566181 b: nan c: -1.566181 d: -1.566181 e: -1.566181 b' of'
a: -6.806071 b: nan c: -6.806071 d: -6.806071 e: -6.806071 b' ('
a: -5.682212 b: nan c: -5.682212 d: -5.682212 e: -5.682212 b'pre'
a: -2.086540 b: nan c: -2.086540 d: -2.086540 e: -2.086540 b'viously'
a: -4.062069 b: nan c: -4.062069 d: -4.062068 e: -4.062068 b' merged'
a: -0.050500 b: nan c: -0.050500 d: -0.050500 e: -0.050500 b')'
a: -4.835000 b: nan c: -4.835000 d: -4.835001 e: -4.835001 b' n'
a: -0.836199 b: nan c: -0.836199 d: -0.836199 e: -0.836199 b'-'
a: -0.001644 b: nan c: -0.001644 d: -0.001644 e: -0.001644 b'gram'
a: -0.090786 b: nan c: -0.090786 d: -0.090786 e: -0.090786 b's'
a: -5.119175 b: nan c: -5.119175 d: -5.119175 e: -5.119175 b' that'
a: -8.598697 b: nan c: -8.598697 d: -8.598697 e: -8.598697 b' most'
a: -1.813356 b: nan c: -1.813356 d: -1.813356 e: -1.813356 b' frequently'
a: -0.143393 b: nan c: -0.143393 d: -0.143393 e: -0.143393 b' occur'
a: -1.869267 b: nan c: -1.869267 d: -1.869267 e: -1.869267 b' together'
a: -1.679995 b: nan c: -1.679995 d: -1.679995 e: -1.679995 b' are'
a: -3.012772 b: nan c: -3.012772 d: -3.012771 e: -3.012771 b' then'
a: -8.233466 b: nan c: -8.233466 d: -8.233466 e: -8.233466 b' again'
a: -0.555408 b: nan c: -0.555408 d: -0.555407 e: -0.555407 b' merged'
a: -0.548524 b: nan c: -0.548524 d: -0.548524 e: -0.548524 b' into'
a: -7.533364 b: nan c: -7.533364 d: -7.533364 e: -7.533364 b' even'
a: -10.055119 b: nan c: -10.055119 d: -10.055119 e: -10.055119 b' length'
a: -13.740651 b: nan c: -13.740651 d: -13.740651 e: -13.740651 b'ier'
a: -2.974525 b: nan c: -2.974525 d: -2.974524 e: -2.974524 b' n'
a: -0.494241 b: nan c: -0.494241 d: -0.494241 e: -0.494241 b'-'
a: -0.001839 b: nan c: -0.001839 d: -0.001839 e: -0.001839 b'gram'
a: -16.614477 b: nan c: -16.614477 d: -16.614477 e: -16.614477 b' repeatedly'
a: -1.601561 b: nan c: -1.601561 d: -1.601561 e: -1.601561 b' until'
a: -3.233735 b: nan c: -3.233735 d: -3.233735 e: -3.233735 b' a'
a: -8.700918 b: nan c: -8.700918 d: -8.700918 e: -8.700918 b' vocabulary'
a: -3.016562 b: nan c: -3.016562 d: -3.016562 e: -3.016562 b' of'
a: -11.269511 b: nan c: -11.269511 d: -11.269511 e: -11.269511 b' prescribed'
a: -0.769698 b: nan c: -0.769698 d: -0.769698 e: -0.769698 b' size'
a: -0.167740 b: nan c: -0.167740 d: -0.167740 e: -0.167740 b' is'
a: -1.090048 b: nan c: -1.090048 d: -1.090048 e: -1.090048 b' obtained'
a: -3.403722 b: nan c: -3.403722 d: -3.403722 e: -3.403722 b' ('
a: -2.906383 b: nan c: -2.906383 d: -2.906383 e: -2.906383 b'in'
a: -4.100207 b: nan c: -4.100207 d: -4.100207 e: -4.100207 b' case'
a: -0.983446 b: nan c: -0.983446 d: -0.983446 e: -0.983446 b' of'
a: -3.104256 b: nan c: -3.104256 d: -3.104256 e: -3.104256 b' G'
a: -0.126536 b: nan c: -0.126536 d: -0.126536 e: -0.126536 b'PT'
a: -0.616138 b: nan c: -0.616138 d: -0.616137 e: -0.616137 b'-3'
a: -3.218654 b: nan c: -3.218654 d: -3.218654 e: -3.218654 b','
a: -3.777549 b: nan c: -3.777549 d: -3.777549 e: -3.777549 b' the'
a: -2.190601 b: nan c: -2.190601 d: -2.190601 e: -2.190601 b' size'
a: -0.764579 b: nan c: -0.764579 d: -0.764579 e: -0.764579 b' is'
a: -12.514678 b: nan c: -12.514678 d: -12.514678 e: -12.514678 b' 502'
a: -8.363537 b: nan c: -8.363537 d: -8.363537 e: -8.363537 b'57'
a: -4.314742 b: nan c: -4.314742 d: -4.314743 e: -4.314743 b').'
a: -8.882359 b: nan c: -8.882359 d: -8.882359 e: -8.882359 b' Token'
a: -9.738820 b: nan c: -9.738820 d: -9.738820 e: -9.738820 b' vocabulary'
a: -4.400854 b: nan c: -4.400854 d: -4.400854 e: -4.400854 b' consists'
a: -0.109277 b: nan c: -0.109277 d: -0.109277 e: -0.109277 b' of'
a: -8.898475 b: nan c: -8.898475 d: -8.898475 e: -8.898475 b' integers'
a: -4.957717 b: nan c: -4.957717 d: -4.957717 e: -4.957717 b','
a: -9.118929 b: nan c: -9.118929 d: -9.118929 e: -9.118929 b' sp'
a: -0.665562 b: nan c: -0.665562 d: -0.665561 e: -0.665561 b'anning'
a: -1.588228 b: nan c: -1.588228 d: -1.588229 e: -1.588229 b' from'
a: -4.640015 b: nan c: -4.640015 d: -4.640015 e: -4.640015 b' zero'
a: -1.703651 b: nan c: -1.703651 d: -1.703651 e: -1.703651 b' up'
a: -0.143534 b: nan c: -0.143534 d: -0.143533 e: -0.143533 b' to'
a: -3.549545 b: nan c: -3.549545 d: -3.549545 e: -3.549545 b' the'
a: -2.210416 b: nan c: -2.210416 d: -2.210416 e: -2.210416 b' size'
a: -0.211880 b: nan c: -0.211880 d: -0.211880 e: -0.211880 b' of'
a: -1.076867 b: nan c: -1.076867 d: -1.076867 e: -1.076867 b' the'
a: -3.076723 b: nan c: -3.076723 d: -3.076723 e: -3.076723 b' token'
a: -3.423433 b: nan c: -3.423433 d: -3.423433 e: -3.423433 b' vocabulary'
a: -0.353134 b: nan c: -0.353134 d: -0.353134 e: -0.353134 b'.'
a: -7.996045 b: nan c: -7.996045 d: -7.996046 e: -7.996046 b' New'
a: -9.829884 b: nan c: -9.829884 d: -9.829884 e: -9.829884 b' '
the_logprobs = [5.3679433, 9.211274, 8.448094, 4.7738934, 3.4546232, 6.3404837, 0.023981515, 0.049261414, 1.2341062, 3.3987653, 4.0517445, 0.43668395, 2.718504, 0.87886953, 13.767629, 0.41483933, 0.71166176, 3.0552065, 0.119380556, 6.023344, 6.0197406, 2.6556172, 0.0033930219, 2.8924723, 1.4361639, 3.359024, 6.046928, 0.77818453, 4.1062055, 0.2365705, 10.3969, 4.23993, 2.4682472, 0.9110154, 3.22471, 7.1198077, 1.5835077, 0.007974568, 0.8230936, 4.4689827, 2.8543656, 9.325323, 0.08440277, 5.722558, 6.927736, 1.1552138, 4.249696, 12.140764, 3.625996, 5.8746696, 0.042335443, 2.8138714, 7.389079, 2.9357145, 3.7903514, 8.418568, 0.8871751, 3.5844026, 0.15329154, 3.057801, 9.531554, 1.7954447, 1.8730229, 2.9815145, 12.873079, 0.0064264955, 10.676861, 1.9685196, 0.10935174, 4.7764773, 3.5973167, 9.188688, 7.1474605, 1.609808, 0.0744403, 4.86161, 0.030639729, 2.436419, 11.050475, 6.7233143, 1.8846788, 0.20982331, 0.3447908, 5.03161, 5.8030853, 0.0042099636, 0.00036663574, 0.004105221, 0.083149485, 0.31666988, 8.383087, 4.9045362, 10.636753, 0.11417867, 0.38460934, 0.024400417, 6.0123363, 0.10444325, 3.506413, 2.394819, 5.698679, 2.301849, 3.5500052, 3.9426951, 0.3785084, 1.6098207, 2.052005, 9.787198, 3.0443645, 2.1540759, 1.3681772, 3.798854, 3.5042424, 1.9517196, 1.4676908, 8.371155, 1.9301053, 11.264958, 1.7255279, 9.192338, 6.5104227, 3.739128, 2.1959853, 2.0493176, 1.4185431, 7.0992203, 3.0273314, 0.35416928, 3.3148656, 5.803182, 1.205845, 5.716505, 2.5686443, 0.053251386, 6.832457, 6.5511804, 0.1382278, 0.18491524, 6.777157, 1.4351199, 7.833315, 1.1265603, 4.893677, 1.3984095, 0.011862322, 4.611034, 6.4617014, 5.8738585, 4.4736733, 7.415085, 0.66066384, 3.812538, 3.0495079, 13.477192, 9.5353365, 8.362512, 0.023733677, 1.9979228, 3.1074903, 4.664283, 2.4997244, 0.8272045, 3.769433, 3.2037737, 8.417553, 0.5211528, 8.454661, 4.176726, 0.6632216, 7.8551536, 4.0593123, 1.0569181, 1.699821, 9.854226, 9.568123, 0.618908, 0.04465261, 9.352193, 0.3166267, 6.039067, 0.6001057, 8.476008, 3.7085686, 5.6887894, 4.5897913, 11.823556, 2.4865057, 4.0746174, 6.0345163, 0.12072154, 4.7984114, 0.23873787, 3.6287637, 4.4844794, 0.330885, 15.75686, 2.4632654, 1.7291378, 1.3259709, 8.920161, 6.889237, 1.667497, 3.4681265, 0.15549047, 9.349035, 2.142191, 6.5307255, 0.03800402, 0.0018907768, 1.2929547, 2.1078541, 0.065226994, 1.5089453, 2.8309584, 2.4365187, 5.900762, 0.3043193, 0.0041852435, 1.768496, 8.031993, 6.739377, 0.7798013, 9.579938, 6.8974295, 2.7466547, 2.7144856, 4.667047, 0.112059176, 7.2317443, 9.501238, 0.42958507, 5.1467166, 0.18379445, 15.040811, 3.363115, 2.558873, 4.9449143, 6.735223, 3.0141523, 9.980461, 7.5854583, 6.683825, 7.0671086, 0.24510594, 0.020978458, 4.787077, 5.5193152, 1.2413146, 5.318388, 13.117612, 1.056907, 7.94247, 2.2253036, 0.6681796, 13.902852, 3.0537941, 2.3729672, 10.522296, 6.443346, 2.0640392, 10.992498, 5.3460712, 4.5540833, 13.498063, 10.621263, 7.8830442, 2.8344936, 9.4217615, 0.8274293, 1.1659956, 9.472489, 4.7420206, 9.971311, 1.1624134, 0.011184601, 5.4767246, 0.7081068, 1.07987, 0.15744506, 5.2191153, 3.383572, 7.9673696, 0.0998429, 12.049184, 3.7465625, 3.753868, 3.8142376, 3.5294697, 4.875571, 4.7299104, 2.925047, 3.0662684, 4.2410264, 3.2967393, 7.303405, 4.9981494, 1.4710495, 2.4342136, 7.047697, 0.014783098, 3.620189, 1.7474347, 0.124110825, 4.5702887, 0.0069157053, 0.34946045, 4.207152, 0.5248925, 2.482481, 7.917991, 4.7779655, 0.9644831, 8.942179, 7.1413445, 2.8076563, 1.4931244, 0.99650246, 2.2207367, 0.023607463, 0.0046161097, 0.40290338, 6.809383, 5.6682987, 0.83163244, 13.157859, 7.490177, 0.92322665, 0.33174023, 3.534358, 8.112887, 2.9124928, 3.3705628, 5.8289704, 0.99333155, 5.20654, 1.4523641, 5.108283, 1.4087713, 1.4612538, 5.858886, 1.7808268, 1.5036536, 7.18387, 4.3499684, 0.23404063, 5.0121107, 3.1040025, 7.904882, 0.5818087, 2.7921894, 3.5784519, 1.3645848, 5.2308064, 1.6589248, 6.2587404, 0.13943712, 4.6636243, 1.9658691, 0.2035498, 4.942329, 2.2254703, 1.5661812, 6.806071, 5.6822124, 2.08654, 4.0620685, 0.050500378, 4.8350005, 0.8361988, 0.0016443531, 0.09078605, 5.119175, 8.598697, 1.8133562, 0.14339331, 1.869267, 1.6799954, 3.0127714, 8.233466, 0.5554074, 0.5485237, 7.5333643, 10.055119, 13.740651, 2.9745245, 0.49424082, 0.0018388837, 16.614477, 1.6015614, 3.233735, 8.700918, 3.0165622, 11.269511, 0.7696982, 0.16774035, 1.0900477, 3.4037223, 2.9063828, 4.100207, 0.98344564, 3.1042562, 0.12653613, 0.6161373, 3.2186544, 3.777549, 2.1906009, 0.76457924, 12.514678, 8.363537, 4.3147426, 8.882359, 9.73882, 4.4008536, 0.10927674, 8.898475, 4.957717, 9.118929, 0.6655615, 1.5882287, 4.6400146, 1.7036512, 0.14353324, 3.5495453, 2.2104158, 0.2118803, 1.0768673, 3.0767229, 3.4234333, 0.35313383, 7.9960456, 9.829884]
the_scores = [9.9661, 327.28787, 420.0938, 414.89343, 430.0521, 413.4574, 423.2652, 442.54846, 435.2554, 437.8017, 431.02148, 433.2741, 431.7649, 431.51428, 427.96927, 441.21802, 438.807, 435.8473, 445.58362, 432.82965, 430.16278, 427.83655, 397.25702, 430.76993, 428.30133, 435.88394, 427.47998, 436.7354, 430.89755, 426.05185, 433.2636, 429.92786, 433.3849, 444.5641, 435.48438, 427.36658, 429.83832, 443.43927, 431.38348, 432.3806, 436.04163, 426.01312, 438.3796, 424.84964, 433.16382, 431.75513, 435.36856, 422.2344, 435.4593, 431.17828, 437.76086, 440.76373, 425.5141, 430.14972, 434.9566, 425.885, 438.10785, 432.32025, 425.99103, 440.97455, 431.0528, 428.14792, 426.62283, 439.96634, 397.99197, 436.242, 412.8427, 416.45172, 442.82642, 437.37714, 434.17706, 431.89462, 415.41266, 416.2669, 407.16718, 412.29828, 416.51144, 436.59647, 422.41342, 405.08075, 405.23425, 417.45532, 434.46692, 434.80292, 430.1206, 416.43787, 418.18295, 426.08838, 438.5788, 441.16104, 425.425, 413.8869, 423.3566, 411.97266, 416.88116, 431.5711, 422.79907, 427.00592, 439.4007, 432.7403, 437.81274, 438.91217, 434.58353, 432.41034, 429.0206, 430.18463, 428.58673, 428.2871, 431.40775, 436.2942, 433.5847, 429.84506, 434.73312, 430.6353, 436.3836, 426.47278, 424.96857, 426.36566, 432.07855, 426.71786, 420.0979, 432.44495, 440.1648, 435.80048, 436.2376, 431.7898, 428.8778, 439.78302, 436.8834, 433.27814, 442.3875, 431.35852, 434.80212, 444.55444, 431.7138, 429.20468, 436.48895, 444.4892, 428.24448, 410.77444, 424.29318, 431.88995, 438.8785, 438.52176, 443.3783, 423.9522, 396.84814, 420.23212, 422.19763, 427.4338, 443.34055, 440.14716, 438.25336, 426.6607, 428.66934, 403.28568, 436.54462, 439.01486, 438.95074, 434.22498, 435.82147, 442.6632, 431.75256, 440.26605, 431.70905, 440.70435, 434.2129, 433.1173, 439.5385, 429.6556, 431.8877, 428.7487, 441.61868, 420.90155, 399.026, 415.68112, 437.11334, 429.0783, 443.75314, 430.12402, 428.41772, 425.90833, 440.51715, 437.92023, 437.1477, 429.227, 435.65527, 440.26526, 430.25983, 441.00497, 439.45886, 442.99512, 431.82373, 431.2073, 444.5484, 419.25598, 426.47055, 430.59943, 442.73383, 424.4433, 404.83383, 425.2348, 429.83844, 418.84232, 423.7257, 436.5782, 414.7328, 425.1214, 439.81888, 441.42645, 431.98367, 424.39166, 431.0615, 431.6104, 427.7746, 428.96582, 435.6264, 428.87363, 432.45605, 430.37128, 428.8032, 441.77386, 423.49475, 411.4188, 405.3178, 429.94592, 427.53217, 425.50098, 423.30566, 402.34787, 429.79785, 415.5559, 442.8064, 418.8507, 419.05887, 428.546, 427.18094, 405.7302, 431.7412, 410.0255, 416.45795, 419.46707, 429.4912, 439.7502, 443.27972, 428.9561, 405.08713, 414.75406, 414.6021, 419.5348, 415.80518, 416.3062, 428.9465, 432.04514, 421.03552, 424.91028, 431.25922, 422.9465, 414.67615, 428.51855, 424.75494, 423.16605, 430.4146, 408.36087, 407.59338, 411.95807, 407.8208, 417.4074, 410.35776, 431.94446, 423.66174, 414.61597, 415.24768, 422.17963, 428.17062, 422.48978, 428.04797, 433.65988, 433.92078, 415.6643, 431.42474, 424.42792, 443.87408, 420.23734, 420.17477, 394.3496, 427.1672, 433.2087, 429.72147, 425.73334, 429.99158, 430.41156, 432.59665, 431.94678, 426.07498, 427.55298, 437.6648, 414.54663, 419.93073, 412.4449, 436.92407, 426.06598, 413.6284, 432.3405, 445.84595, 441.68152, 432.82486, 440.20126, 430.03894, 415.68427, 424.5505, 436.86633, 424.6437, 412.11578, 397.7491, 430.24622, 438.63525, 419.45282, 430.82797, 442.37125, 444.25372, 425.82544, 423.75784, 433.33862, 419.62543, 408.33215, 411.51947, 431.75537, 435.3877, 426.09415, 435.26398, 432.55237, 428.47076, 430.4841, 426.13367, 432.49524, 428.49744, 430.97382, 437.40253, 431.71573, 442.11032, 436.8038, 425.79004, 407.95587, 406.0824, 430.94595, 435.3224, 426.71408, 439.92938, 433.17017, 432.0222, 436.2652, 432.49268, 443.333, 428.82367, 442.2268, 426.9667, 432.86597, 443.552, 429.84003, 431.6844, 440.89514, 426.99738, 413.30627, 415.31824, 427.53458, 437.4763, 428.30524, 423.2324, 420.21356, 436.4215, 438.7958, 433.05518, 435.47403, 439.6424, 440.57468, 441.48083, 436.14688, 431.00513, 438.39728, 443.93365, 432.79474, 422.68762, 418.94516, 432.13214, 424.85065, 424.5916, 422.7, 440.3304, 436.09534, 424.0968, 430.8927, 422.688, 436.15045, 438.68228, 439.09998, 438.97546, 422.01346, 427.10272, 435.844, 431.22925, 421.66595, 433.00668, 433.61127, 433.20422, 434.31158, 438.16455, 422.6285, 413.02704, 418.50302, 421.1029, 424.06674, 433.90265, 444.95532, 424.40997, 431.5473, 421.08575, 409.00424, 431.35825, 428.51935, 434.4775, 439.66968, 426.8051, 430.85864, 435.49084, 431.9904, 431.23456, 432.9321, 437.97174, 424.33398, 421.94818]
the_tokens = [b'A', b' large', b' language', b' model', b' (', b'LL', b'M', b')', b' is', b' a', b' type', b' of', b' language', b' model', b' notable', b' for', b' its', b' ability', b' to', b' achieve', b' general', b'-p', b'urpose', b' language', b' understanding', b' and', b' generation', b'.', b' LL', b'Ms', b' acquire', b' these', b' abilities', b' by', b' using', b' massive', b' amounts', b' of', b' data', b' to', b' learn', b' billions', b' of', b' parameters', b' during', b' training', b' and', b' consuming', b' large', b' computational', b' resources', b' during', b' their', b' training', b' and', b' operation', b'.', b' LL', b'Ms', b' are', b' artificial', b' neural', b' networks', b' (', b'main', b'ly', b' transform', b'ers', b')', b' and', b' are', b' (', b'pre', b'-', b')', b'tr', b'ained', b' using', b' self', b'-s', b'uperv', b'ised', b' learning', b' and', b' semi', b'-s', b'uperv', b'ised', b' learning', b'.', b'\n', b'As', b' autor', b'eg', b'ress', b'ive', b' language', b' models', b',', b' they', b' work', b' by', b' taking', b' an', b' input', b' text', b' and', b' repeatedly', b' predicting', b' the', b' next', b' token', b' or', b' word', b'.', b' Up', b' to', b' 2020', b',', b' fine', b' tuning', b' was', b' the', b' only', b' way', b' a', b' model', b' could', b' be', b' adapted', b' to', b' be', b' able', b' to', b' accomplish', b' specific', b' tasks', b'.', b' L', b'arger', b' sized', b' models', b',', b' such', b' as', b' G', b'PT', b'-3', b',', b' however', b',', b' can', b' be', b' prompt', b'-', b'engine', b'ered', b' to', b' achieve', b' similar', b' results', b'.', b' They', b' are', b' thought', b' to', b' acquire', b' knowledge', b' about', b' syntax', b',', b' semantics', b' and', b' "', b'ont', b'ology', b'"', b' inherent', b' in', b' human', b' language', b' corpora', b',', b' but', b' also', b' inaccur', b'acies', b' and', b' bias', b'es', b' present', b' in', b' the', b' corpora', b'.', b'No', b'table', b' examples', b' include', b' Open', b'AI', b"'s", b' G', b'PT', b' models', b' (', b'e', b'.', b'g', b'.,', b' G', b'PT', b'-3', b'.', b'5', b' and', b' G', b'PT', b'-4', b',', b' used', b' in', b' Chat', b'G', b'PT', b'),', b' Google', b"'s", b' Pa', b'LM', b' (', b'used', b' in', b' Bard', b'),', b' and', b' Meta', b"'s", b' L', b'La', b'Ma', b',', b' as', b' well', b' as', b' B', b'LO', b'OM', b',', b' Ern', b'ie', b' 3', b'.', b'0', b' Titan', b',', b' and', b' Anthrop', b'ic', b"'s", b' Claude', b' 2', b'.\n', b'\n\n', b'==', b' D', b'ataset', b' pre', b'processing', b' ==', b'\n\n\n', b'===', b' Prob', b'abil', b'istic', b' token', b'ization', b' ===', b'\n', b'Using', b' a', b' modification', b' of', b' byte', b'-', b'pair', b' encoding', b',', b' in', b' the', b' first', b' step', b',', b' all', b' unique', b' characters', b' (', b'including', b' bl', b'anks', b' and', b' pun', b'ctuation', b' marks', b')', b' are', b' treated', b' as', b' an', b' initial', b' set', b' of', b' n', b'-', b'gram', b's', b' (', b'i', b'.', b'e', b'.', b' initial', b' set', b' of', b' uni', b'-', b'gram', b's', b').', b' Success', b'ively', b' the', b' most', b' frequent', b' pair', b' of', b' adjacent', b' characters', b' is', b' merged', b' into', b' a', b' bi', b'-', b'gram', b' and', b' all', b' instances', b' of', b' the', b' pair', b' are', b' replaced', b' by', b' it', b'.', b' All', b' occurrences', b' of', b' adjacent', b' pairs', b' of', b' (', b'pre', b'viously', b' merged', b')', b' n', b'-', b'gram', b's', b' that', b' most', b' frequently', b' occur', b' together', b' are', b' then', b' again', b' merged', b' into', b' even', b' length', b'ier', b' n', b'-', b'gram', b' repeatedly', b' until', b' a', b' vocabulary', b' of', b' prescribed', b' size', b' is', b' obtained', b' (', b'in', b' case', b' of', b' G', b'PT', b'-3', b',', b' the', b' size', b' is', b' 502', b'57', b').', b' Token', b' vocabulary', b' consists', b' of', b' integers', b',', b' sp', b'anning', b' from', b' zero', b' up', b' to', b' the', b' size', b' of', b' the', b' token', b' vocabulary', b'.', b' New', b' ']
a_duration = 45.63701820373535
b_duration = 1.6752548217773438
c_duration = 67.41351199150085
d_duration = 3.273803234100342
e_duration = 1.6066546440124512