ã·ãŒã±ã³ã¹ã»ããŒã»ã·ãŒã±ã³ã¹ã¢ãã«ã䜿ã£ãŠãPythonæ©æ¢°ç¿»èš³ã®äžçãæ¢æ±ããŸããããç¬èªã®ç¿»èš³ã·ã¹ãã ãäœæããããã®ãæŠå¿µãå®è£ ããã¹ããã©ã¯ãã£ã¹ãåŠã³ãŸãã
Pythonæ©æ¢°ç¿»èš³ïŒã·ãŒã±ã³ã¹ã»ããŒã»ã·ãŒã±ã³ã¹ã¢ãã«ã®æ§ç¯
仿¥ã®çžäºæ¥ç¶ãé²ãäžçã§ã¯ãç°ãªãèšèªãçè§£ããã³ãã¥ãã±ãŒã·ã§ã³ããšãèœåããããŸã§ä»¥äžã«éèŠã«ãªã£ãŠããŸããæ©æ¢°ç¿»èš³ïŒMTïŒãã€ãŸãããèšèªããå¥ã®èšèªãžã®ããã¹ãã®èªå翻蚳ã¯ãèšèªã®å£ãæã¡ç Žããã°ããŒãã«ãªã³ãã¥ãã±ãŒã·ã§ã³ãä¿é²ããããã®éèŠãªããŒã«ãšãªã£ãŠããŸããPythonã¯ãè±å¯ãªã©ã€ãã©ãªãšãã¬ãŒã ã¯ãŒã¯ã®ãšã³ã·ã¹ãã ã«ããã匷åãªMTã·ã¹ãã ãæ§ç¯ããããã®åªãããã©ãããã©ãŒã ãæäŸããŸãããã®ããã°èšäºã§ã¯ãPythonæ©æ¢°ç¿»èš³ã®äžçãæãäžããçŸä»£ã®MTã«ãããäž»èŠãªã¢ãããŒãã§ããã·ãŒã±ã³ã¹ã»ããŒã»ã·ãŒã±ã³ã¹ïŒseq2seqïŒã¢ãã«ã«çŠç¹ãåœãŠãŸãã
æ©æ¢°ç¿»èš³ãšã¯ïŒ
æ©æ¢°ç¿»èš³ã¯ããœãŒã¹èšèªïŒäŸïŒãã©ã³ã¹èªïŒã®ããã¹ããããã®æå³ãä¿æããªããã¿ãŒã²ããèšèªïŒäŸïŒè±èªïŒã«å€æããããã»ã¹ãèªååããããšãç®çãšããŠããŸããåæã®MTã·ã¹ãã ã¯ãææ³èŠåãšèŸæžãæåã§å®çŸ©ããããšãå«ããã«ãŒã«ããŒã¹ã®ã¢ãããŒãã«äŸåããŠããŸããããããããããã®ã·ã¹ãã ã¯ãã°ãã°èããèªç¶èšèªã®è€éããšãã¥ã¢ã³ã¹ãåŠçããã®ã«èŠåŽããŸããã
ææ°ã®MTã·ã¹ãã ãç¹ã«ãã¥ãŒã©ã«ãããã¯ãŒã¯ã«åºã¥ããã®ã¯ãç®èŠãŸãã鲿©ãéããŠããŸãããããã®ã·ã¹ãã ã¯ãå€§èŠæš¡ãªäžŠåããã¹ãããŒã¿ïŒã€ãŸããçžäºã«ç¿»èš³ãããè€æ°ã®èšèªã®ããã¹ãïŒãåæããããšã«ãã£ãŠç¿»èš³ãåŠç¿ããŸãã
æ©æ¢°ç¿»èš³ã®ããã®ã·ãŒã±ã³ã¹ã»ããŒã»ã·ãŒã±ã³ã¹ïŒSeq2SeqïŒã¢ãã«
ã·ãŒã±ã³ã¹ã»ããŒã»ã·ãŒã±ã³ã¹ã¢ãã«ã¯ãæ©æ¢°ç¿»èš³ã®åéã«é©åœããããããŸããããããã¯ãå¯å€é·ã®å ¥åããã³åºåã·ãŒã±ã³ã¹ãåŠçããããã«ç¹å¥ã«èšèšããããã¥ãŒã©ã«ãããã¯ãŒã¯ã¢ãŒããã¯ãã£ã®äžçš®ã§ããããã«ããããœãŒã¹æãšã¿ãŒã²ããæãç°ãªãé·ããšæ§é ãæã€ããšãå€ãMTã«æé©ã§ãã
ãšã³ã³ãŒããŒãã³ãŒããŒã¢ãŒããã¯ãã£
seq2seqã¢ãã«ã®äžå¿ã«ã¯ããšã³ã³ãŒããŒãã³ãŒããŒã¢ãŒããã¯ãã£ããããŸãããã®ã¢ãŒããã¯ãã£ã¯ã次ã®2ã€ã®äž»èŠã³ã³ããŒãã³ãã§æ§æãããŠããŸãã
- ãšã³ã³ãŒããŒïŒãšã³ã³ãŒããŒã¯ãå ¥åã·ãŒã±ã³ã¹ïŒãœãŒã¹æïŒãåãåããåºå®é·ã®ãã¯ãã«è¡šçŸïŒã³ã³ããã¹ããã¯ãã«ãŸãã¯æèãã¯ãã«ãšãåŒã°ããŸãïŒã«å€æããŸãããã®ãã¯ãã«ã¯ãå ¥åã·ãŒã±ã³ã¹å šäœã®æå³ãã«ãã»ã«åããŸãã
- ãã³ãŒããŒïŒãã³ãŒããŒã¯ããšã³ã³ãŒããŒãçæããã³ã³ããã¹ããã¯ãã«ãåãåããäžåºŠã«1ã€ã®åèªãã€åºåã·ãŒã±ã³ã¹ïŒã¿ãŒã²ããæïŒãçæããŸãã
ãšã³ã³ãŒããŒãèŠçŽè ããã³ãŒããŒãæžãæãè ãšèããŠãã ããããšã³ã³ãŒããŒã¯å ¥åå šäœãèªã¿åããããã1ã€ã®ãã¯ãã«ã«èŠçŽããŸããæ¬¡ã«ããã³ãŒããŒã¯ãã®èŠçŽã䜿çšããŠãããã¹ããã¿ãŒã²ããèšèªã§æžãæããŸãã
ååž°åãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒRNNïŒ
ååž°åãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒRNNïŒãç¹ã«LSTMïŒLong Short-Term MemoryïŒãšGRUïŒGated Recurrent UnitsïŒã¯ããšã³ã³ãŒããŒãšãã³ãŒããŒã®äž¡æ¹ã®æ§æèŠçŽ ãšããŠäžè¬çã«äœ¿çšãããŠããŸããRNNã¯ãéå»ã®å ¥åã«é¢ããæ å ±ãææããé ãç¶æ ãç¶æãããããã·ãŒã±ã³ã·ã£ã«ããŒã¿ã®åŠçã«é©ããŠããŸããããã«ãããæäžã®åèªéã®äŸåé¢ä¿ãåŠçã§ããŸãã
ãšã³ã³ãŒããŒRNNã¯ããœãŒã¹æãåèªããšã«èªã¿åããåã¹ãããã§é ãç¶æ ãæŽæ°ããŸãããšã³ã³ãŒããŒã®æçµçãªé ãç¶æ ã¯ã³ã³ããã¹ããã¯ãã«ã«ãªãããã³ãŒããŒã«æž¡ãããŸãã
ãã³ãŒããŒRNNã¯ãã³ã³ããã¹ããã¯ãã«ããã®åæé ãç¶æ ãšããŠéå§ããã¿ãŒã²ããæãåèªããšã«çæããŸããåã¹ãããã§ããã³ãŒããŒã¯åã®åèªãšãã®é ãç¶æ ãå ¥åãšããŠåãåããæ¬¡ã®åèªãšæŽæ°ãããé ãç¶æ ãçæããŸãããã®ããã»ã¹ã¯ããã³ãŒããŒãææ«ããŒã¯ã³ïŒäŸïŒ<EOS>ïŒãçæãããŸã§ç¶è¡ããã翻蚳ã®çµããã瀺ããŸãã
äŸïŒãHello worldããè±èªãããã©ã³ã¹èªã«ç¿»èš³ãã
seq2seqã¢ãã«ãç°¡åãªãã¬ãŒãºãHello worldããè±èªãããã©ã³ã¹èªã«ç¿»èš³ããæ¹æ³ã説æããŸãããã
- ãšã³ã³ãŒãã£ã³ã°ïŒãšã³ã³ãŒããŒRNNã¯ããHelloããšãworldããšããåèªãé æ¬¡èªã¿åããŸãããworldããåŠçããåŸããã®æçµçãªé ãç¶æ ã¯ãã¬ãŒãºå šäœã®æå³ã衚ããŸãã
- ã³ã³ããã¹ããã¯ãã«ïŒãã®æçµçãªé ãç¶æ ãã³ã³ããã¹ããã¯ãã«ã«ãªããŸãã
- ãã³ãŒãã£ã³ã°ïŒãã³ãŒããŒRNNã¯ãã³ã³ããã¹ããã¯ãã«ãåãåãããã©ã³ã¹èªãžã®ç¿»èš³ã®çæãéå§ããŸããæåã«ãBonjourããæ¬¡ã«ãleããæåŸã«ãmondeããçæããå ŽåããããŸãããŸããæã®çµããã瀺ã<EOS>ããŒã¯ã³ãçæããŸãã
- åºåïŒæçµçãªåºåã¯ãBonjour le monde <EOS>ãã«ãªããŸãã<EOS>ããŒã¯ã³ãåé€ãããšãã¢ãã«ã¯ãã¬ãŒãºã®ç¿»èš³ã«æåããŸããã
ã¢ãã³ã·ã§ã³ã¡ã«ããºã
äžèšã®åºæ¬çãªseq2seqã¢ãã«ã¯ããªãã®ããã©ãŒãã³ã¹ãçºæ®ã§ããŸãããããã«ããã¯ã«æ©ãŸãããŸãããœãŒã¹æå šäœã®æå³ããåäžã®åºå®é·ãã¯ãã«ã«å§çž®ãããã®ã§ããããã¯ãé·ãè€éãªæã®å Žåã«åé¡ãšãªãå¯èœæ§ããããŸããã³ã³ããã¹ããã¯ãã«ãé¢é£ãããã¹ãŠã®æ å ±ãææã§ããªãå¯èœæ§ãããããã§ãã
ã¢ãã³ã·ã§ã³ã¡ã«ããºã ã¯ããã³ãŒããŒããã³ãŒãã£ã³ã°ããã»ã¹ã®åã¹ãããã§ãœãŒã¹æã®ç°ãªãéšåã«çŠç¹ãåãããããšãå¯èœã«ããããšã«ããããã®ããã«ããã¯ã«å¯ŸåŠããŸããã³ã³ããã¹ããã¯ãã«ã ãã«é Œãã®ã§ã¯ãªãããã³ãŒããŒã¯ããŸããŸãªã¿ã€ã ã¹ãããã§ãšã³ã³ãŒããŒã®é ãç¶æ ã«æ³šæãæããŸããããã«ããããã³ãŒããŒã¯ãçŸåšçæãããŠããåèªã«æãé¢é£æ§ã®é«ããœãŒã¹æã®äžéšã«éžæçã«çŠç¹ãåãããããšãã§ããŸãã
ã¢ãã³ã·ã§ã³ã®ä»çµã¿
ã¢ãã³ã·ã§ã³ã¡ã«ããºã ã«ã¯ãéåžžãæ¬¡ã®æé ãå«ãŸããŸãã
- ã¢ãã³ã·ã§ã³éã¿ã®èšç®ïŒãã³ãŒããŒã¯ãçŸåšã®ãã³ãŒãã£ã³ã°ã¹ãããã«å¯ŸãããœãŒã¹æã®ååèªã®éèŠåºŠã衚ãäžé£ã®ã¢ãã³ã·ã§ã³éã¿ãèšç®ããŸãããããã®éã¿ã¯ããã³ãŒããŒã®çŸåšã®é ãç¶æ ãšåã¿ã€ã ã¹ãããã®ãšã³ã³ãŒããŒã®é ãç¶æ ãæ¯èŒããã¹ã³ã¢ãªã³ã°é¢æ°ã䜿çšããŠèšç®ãããŸãã
- ã³ã³ããã¹ããã¯ãã«ã®èšç®ïŒã¢ãã³ã·ã§ã³éã¿ã¯ããšã³ã³ãŒããŒã®é ãç¶æ ã®å éå¹³åãèšç®ããããã«äœ¿çšãããŸãããã®å éå¹³åãã³ã³ããã¹ããã¯ãã«ã«ãªãããã³ãŒããŒã次ã®åèªãçæããããã«äœ¿çšããŸãã
- ã¢ãã³ã·ã§ã³ã«ãããã³ãŒãã£ã³ã°ïŒãã³ãŒããŒã¯ãã¢ãã³ã·ã§ã³ã¡ã«ããºã ããæŽŸçããã³ã³ããã¹ããã¯ãã«*ãš*ãã®åã®é ãç¶æ ã䜿çšããŠã次ã®åèªãäºæž¬ããŸãã
ã¢ãã³ã·ã§ã³ã¡ã«ããºã ã¯ããœãŒã¹æã®ããŸããŸãªéšåã«æ³šæãæãããšã§ããã³ãŒããŒããããã¥ã¢ã³ã¹ã®ããã³ã³ããã¹ãåºæã®æ å ±ãææã§ããããã«ãªãã翻蚳ã®å質ãåäžããŸãã
ã¢ãã³ã·ã§ã³ã®ã¡ãªãã
- 粟床ã®åäžïŒã¢ãã³ã·ã§ã³ã«ãããã¢ãã«ã¯å ¥åæã®é¢é£éšåã«éäžã§ãããããããæ£ç¢ºãªç¿»èš³ãå¯èœã«ãªããŸãã
- é·ãæã®ããè¯ãåŠçïŒæ å ±ããã«ããã¯ãåé¿ããããšã«ãããã¢ãã³ã·ã§ã³ã«ãããã¢ãã«ã¯é·ãæããã广çã«åŠçã§ããŸãã
- è§£éå¯èœæ§ïŒã¢ãã³ã·ã§ã³éã¿ã¯ãã¢ãã«ã翻蚳äžã«ãœãŒã¹æã®ã©ã®éšåã«çŠç¹ãåœãŠãŠãããã«ã€ããŠã®æŽå¯ãæäŸããŸããããã¯ãã¢ãã«ãã©ã®ããã«æææ±ºå®ãè¡ã£ãŠããããçè§£ããã®ã«åœ¹ç«ã¡ãŸãã
Pythonã§æ©æ¢°ç¿»èš³ã¢ãã«ãæ§ç¯ãã
TensorFlowãPyTorchãªã©ã®ã©ã€ãã©ãªã䜿çšããŠãPythonã§æ©æ¢°ç¿»èš³ã¢ãã«ãæ§ç¯ããéã«å¿ èŠãªæé ã®æŠèŠã説æããŸãããã
1. ããŒã¿ã®æºå
æåã®æé ã¯ãããŒã¿ã®æºåã§ããããã«ã¯ããœãŒã¹èšèªã®æãšãã®å¯Ÿå¿ããã¿ãŒã²ããèšèªã§ã®ç¿»èš³ã§æ§æãããåäŸãå«ãã䞊åããã¹ãã®å€§èŠæš¡ãªããŒã¿ã»ããã®åéãå«ãŸããŸããæ©æ¢°ç¿»èš³ã¯ãŒã¯ã·ã§ããïŒWMTïŒãªã©ã®å ¬éããŒã¿ã»ãããããã®ç®çã§ãã䜿çšãããŸãã
ããŒã¿ã®æºåã«ã¯ãéåžžãæ¬¡ã®æé ãå«ãŸããŸãã
- ããŒã¯ã³åïŒæãåã ã®åèªãŸãã¯ãµãã¯ãŒãã«åå²ããŸããäžè¬çãªããŒã¯ã³åææ³ã«ã¯ã空çœããŒã¯ã³åãšãã€ããã¢ãšã³ã³ãŒãã£ã³ã°ïŒBPEïŒããããŸãã
- èªåœã®äœæïŒããŒã¿ã»ããå ã®ãã¹ãŠã®åºæã®ããŒã¯ã³ããèªåœãäœæããŸããåããŒã¯ã³ã«ã¯åºæã®ã€ã³ããã¯ã¹ãå²ãåœãŠãããŸãã
- ããã£ã³ã°ïŒãã¹ãŠã®æãåãé·ãã«æããããã«ãããã£ã³ã°ããŒã¯ã³ãæã®æ«å°Ÿã«è¿œå ããŸããããã¯ããããåŠçã«å¿ èŠã§ãã
- ãã¬ãŒãã³ã°ãæ€èšŒãããã³ãã¹ãã»ããã®äœæïŒããŒã¿ãããã¬ãŒãã³ã°ã»ããïŒã¢ãã«ã®ãã¬ãŒãã³ã°çšïŒãæ€èšŒã»ããïŒãã¬ãŒãã³ã°äžã®ããã©ãŒãã³ã¹ã®ç£èŠçšïŒãããã³ãã¹ãã»ããïŒæçµã¢ãã«ã®è©äŸ¡çšïŒã®3ã€ã®ã»ããã«åå²ããŸãã
ããšãã°ãè±èªããã¹ãã€ã³èªãžã®ç¿»èš³ã¢ãã«ããã¬ãŒãã³ã°ããå Žåã¯ãè±èªã®æãšãã®å¯Ÿå¿ããã¹ãã€ã³èªã®ç¿»èš³ã®ããŒã¿ã»ãããå¿ èŠã«ãªããŸãããã¹ãŠã®ããã¹ããå°æåã«ããå¥èªç¹ãåãé€ããæãåèªã«ããŒã¯ã³åããããšã«ãããããŒã¿ãååŠçã§ããŸããæ¬¡ã«ãäž¡æ¹ã®èšèªã®ãã¹ãŠã®åºæãªåèªã®èªåœãäœæããæãåºå®é·ã«ããã£ã³ã°ããŸãã
2. ã¢ãã«ã®å®è£
次ã®ã¹ãããã¯ãTensorFlowãPyTorchãªã©ã®æ·±å±€åŠç¿ãã¬ãŒã ã¯ãŒã¯ã䜿çšããŠãã¢ãã³ã·ã§ã³ãåããseq2seqã¢ãã«ãå®è£ ããããšã§ããããã«ã¯ããšã³ã³ãŒããŒããã³ãŒããŒãããã³ã¢ãã³ã·ã§ã³ã¡ã«ããºã ã®å®çŸ©ãå«ãŸããŸãã
ã³ãŒãã®ç°¡ç¥åãããæŠèŠïŒæ¬äŒŒã³ãŒãã䜿çšïŒã次ã«ç€ºããŸãã
# ãšã³ã³ãŒããŒãå®çŸ©ãã
class Encoder(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, num_layers):
# ... (EmbeddingãLSTMãªã©ã®ã¬ã€ã€ãŒã®åæå)
def forward(self, input_sequence):
# ... (åã蟌ã¿ãšLSTMãä»ããŠå
¥åã·ãŒã±ã³ã¹ãåŠç)
return hidden_states, last_hidden_state
# ã¢ãã³ã·ã§ã³ã¡ã«ããºã ãå®çŸ©ãã
class Attention(nn.Module):
def __init__(self, hidden_dim):
# ... (ã¢ãã³ã·ã§ã³éã¿ãèšç®ããããã®ã¬ã€ã€ãŒã®åæå)
def forward(self, decoder_hidden, encoder_hidden_states):
# ... (ã¢ãã³ã·ã§ã³éã¿ãšã³ã³ããã¹ããã¯ãã«ãèšç®)
return context_vector, attention_weights
# ãã³ãŒããŒãå®çŸ©ãã
class Decoder(nn.Module):
def __init__(self, output_dim, embedding_dim, hidden_dim, num_layers, attention):
# ... (EmbeddingãLSTMãå
šçµåå±€ãªã©ã®ã¬ã€ã€ãŒã®åæå)
def forward(self, input_word, hidden_state, encoder_hidden_states):
# ... (åã蟌ã¿ãšLSTMãä»ããŠå
¥ååèªãåŠç)
# ... (ã¢ãã³ã·ã§ã³ã¡ã«ããºã ãé©çš)
# ... (次ã®åèªãäºæž¬)
return predicted_word, hidden_state
# Seq2Seqã¢ãã«ãå®çŸ©ãã
class Seq2Seq(nn.Module):
def __init__(self, encoder, decoder):
# ... (ãšã³ã³ãŒããŒãšãã³ãŒããŒã®åæå)
def forward(self, source_sequence, target_sequence):
# ... (ãœãŒã¹ã·ãŒã±ã³ã¹ããšã³ã³ãŒã)
# ... (ã¿ãŒã²ããã·ãŒã±ã³ã¹ããã³ãŒãããŠçæ)
return predicted_sequence
3. ã¢ãã«ã®ãã¬ãŒãã³ã°
ã¢ãã«ãå®è£ ããããããã¬ãŒãã³ã°ããŒã¿ã§ãã¬ãŒãã³ã°ããå¿ èŠããããŸããããã«ã¯ããœãŒã¹æãšãã®å¯Ÿå¿ããã¿ãŒã²ããæãã¢ãã«ã«äŸçµŠããäºæž¬ããã翻蚳ãšå®éã®ç¿»èš³ã®å·®ãæå°éã«æããããã«ã¢ãã«ã®ãã©ã¡ãŒã¿ãŒã調æŽããããšãå«ãŸããŸãã
ãã¬ãŒãã³ã°ããã»ã¹ã«ã¯ãéåžžãæ¬¡ã®æé ãå«ãŸããŸãã
- æå€±é¢æ°ã®å®çŸ©ïŒäºæž¬ããã翻蚳ãšå®éã®ç¿»èš³ã®å·®ã枬å®ããæå€±é¢æ°ãéžæããŸããäžè¬çãªæå€±é¢æ°ã«ã¯ãã¯ãã¹ãšã³ããããŒæå€±ããããŸãã
- æé©å颿°ã®å®çŸ©ïŒæå€±é¢æ°ãæå°åããããã«ã¢ãã«ã®ãã©ã¡ãŒã¿ãŒãæŽæ°ããæé©åã¢ã«ãŽãªãºã ãéžæããŸããäžè¬çãªæé©å颿°ã«ã¯ãAdamãšSGDããããŸãã
- ãã¬ãŒãã³ã°ã«ãŒãïŒãã¬ãŒãã³ã°ããŒã¿ã«å¯ŸããŠç¹°ãè¿ãããœãŒã¹æãšã¿ãŒã²ããæã®ããããã¢ãã«ã«äŸçµŠããŸããåãããã«ã€ããŠãæå€±ãèšç®ããåŸé ãèšç®ããã¢ãã«ã®ãã©ã¡ãŒã¿ãŒãæŽæ°ããŸãã
- æ€èšŒïŒæ€èšŒã»ããã§ã®ã¢ãã«ã®ããã©ãŒãã³ã¹ã宿çã«è©äŸ¡ããŸããããã¯ããã¬ãŒãã³ã°ããã»ã¹ãç£èŠããéå°é©åãé²ãã®ã«åœ¹ç«ã¡ãŸãã
éåžžãã¢ãã«ãããã€ãã®ãšããã¯ã«ããã£ãŠãã¬ãŒãã³ã°ããŸããåãšããã¯ã§ã¯ããã¬ãŒãã³ã°ããŒã¿ã»ããå šäœã1åç¹°ãè¿ããŸãããã¬ãŒãã³ã°äžããã¬ãŒãã³ã°ã»ãããšæ€èšŒã»ããã®äž¡æ¹ã®æå€±ãç£èŠããŸããæ€èšŒæå€±ãå¢å ãå§ããå Žåãã¢ãã«ããã¬ãŒãã³ã°ããŒã¿ã«éå°é©åããŠããããšã瀺ããŠããããã¬ãŒãã³ã°ã忢ããããã¢ãã«ã®ãã€ããŒãã©ã¡ãŒã¿ãŒã調æŽããå¿ èŠãããå ŽåããããŸãã
4. è©äŸ¡
ãã¬ãŒãã³ã°åŸãã¢ãã«ã®ããã©ãŒãã³ã¹ãè©äŸ¡ããããã«ããã¹ãã»ããã§ã¢ãã«ãè©äŸ¡ããå¿ èŠããããŸããæ©æ¢°ç¿»èš³ã®äžè¬çãªè©äŸ¡ææšã«ã¯ãBLEUïŒBilingual Evaluation UnderstudyïŒã¹ã³ã¢ãšMETEORããããŸãã
BLEUã¹ã³ã¢ã¯ãäºæž¬ããã翻蚳ãšåç §ç¿»èš³ã®é¡äŒŒåºŠã枬å®ããŸããäºæž¬ããã翻蚳ã«ãããn-gramïŒnåã®åèªã®ã·ãŒã±ã³ã¹ïŒã®ç²ŸåºŠãåç §ç¿»èš³ãšæ¯èŒããŠèšç®ããŸãã
ã¢ãã«ãè©äŸ¡ããã«ã¯ããã¹ãã»ãããããœãŒã¹æãã¢ãã«ã«äŸçµŠãã察å¿ãã翻蚳ãçæããŸããæ¬¡ã«ãBLEUã¹ã³ã¢ãŸãã¯ä»ã®è©äŸ¡ææšã䜿çšããŠãçæããã翻蚳ãåç §ç¿»èš³ãšæ¯èŒããŸãã
5. æšè«
ã¢ãã«ããã¬ãŒãã³ã°ãããè©äŸ¡ããããšãæ°ããæã翻蚳ããããã«äœ¿çšã§ããŸããããã«ã¯ããœãŒã¹æãã¢ãã«ã«äŸçµŠãã察å¿ããã¿ãŒã²ããæãçæããããšãå«ãŸããŸãã
æšè«ããã»ã¹ã«ã¯ãéåžžãæ¬¡ã®æé ãå«ãŸããŸãã
- å ¥åæã®ããŒã¯ã³åïŒãœãŒã¹æãåèªãŸãã¯ãµãã¯ãŒãã«ããŒã¯ã³åããŸãã
- å ¥åæã®ãšã³ã³ãŒãïŒããŒã¯ã³åãããæããšã³ã³ãŒããŒã«ãã£ãŒãããŠãã³ã³ããã¹ããã¯ãã«ãååŸããŸãã
- ã¿ãŒã²ããæã®ãã³ãŒãïŒãã³ãŒããŒã䜿çšããŠãã¿ãŒã²ããæãäžåºŠã«1ã€ã®åèªãã€çæããæé ããŒã¯ã³ïŒäŸïŒ<SOS>ïŒããéå§ããŸããåã¹ãããã§ããã³ãŒããŒã¯åã®åèªãšã³ã³ããã¹ããã¯ãã«ãå ¥åãšããŠåãåããæ¬¡ã®åèªãçæããŸãããã®ããã»ã¹ã¯ããã³ãŒããŒãææ«ããŒã¯ã³ïŒäŸïŒ<EOS>ïŒãçæãããŸã§ç¶è¡ãããŸãã
- åŸåŠçïŒçæãããæãã<SOS>ãš<EOS>ããŒã¯ã³ãåé€ããåèªããããŒã¯ã³åããŠæçµçãªç¿»èš³ãååŸããŸãã
Pythonã§ã®æ©æ¢°ç¿»èš³ã®ããã®ã©ã€ãã©ãªãšãã¬ãŒã ã¯ãŒã¯
Pythonã¯ãæ©æ¢°ç¿»èš³ã¢ãã«ã®éçºã容æã«ããè±å¯ãªã©ã€ãã©ãªãšãã¬ãŒã ã¯ãŒã¯ã®ãšã³ã·ã¹ãã ãæäŸããŠããŸããæã人æ°ã®ãããªãã·ã§ã³ã«ã¯ã次ã®ãããªãã®ããããŸãã
- TensorFlowïŒGoogleãéçºããã匷åã§å€çšéãªæ·±å±€åŠç¿ãã¬ãŒã ã¯ãŒã¯ãTensorFlowã¯ãã¢ãã³ã·ã§ã³ãåããseq2seqã¢ãã«ãªã©ããã¥ãŒã©ã«ãããã¯ãŒã¯ãæ§ç¯ããã³ãã¬ãŒãã³ã°ããããã®å¹ åºãããŒã«ãšAPIãæäŸããŸãã
- PyTorchïŒãã1ã€ã®äººæ°ã®ããæ·±å±€åŠç¿ãã¬ãŒã ã¯ãŒã¯ã§ããã®æè»æ§ãšäœ¿ããããã§ç¥ãããŠããŸããPyTorchã¯ãç ç©¶ãšå®éšã«ç¹ã«ããé©ããŠãããseq2seqã¢ãã«ã匷åã«ãµããŒãããŠããŸãã
- Hugging Face TransformersïŒBERTãBARTãªã©ã®TransformerããŒã¹ã®ã¢ãã«ãå«ããäºåãã¬ãŒãã³ã°æžã¿ã®èšèªã¢ãã«ãæäŸããã©ã€ãã©ãªãæ©æ¢°ç¿»èš³ã¿ã¹ã¯ã«åãããŠåŸ®èª¿æŽã§ããŸãã
- OpenNMT-pyïŒPyTorchã§èšè¿°ããããªãŒãã³ãœãŒã¹ã®ãã¥ãŒã©ã«æ©æ¢°ç¿»èš³ããŒã«ããããããŸããŸãªMTã¢ãŒããã¯ãã£ãæ§ç¯ããå®éšããããã®æè»ã§ã¢ãžã¥ãŒã«åŒã®ãã¬ãŒã ã¯ãŒã¯ãæäŸããŸãã
- Marian NMTïŒC++ã§èšè¿°ãããPythonã®ãã€ã³ãã£ã³ã°ãåããé«éãªãã¥ãŒã©ã«æ©æ¢°ç¿»èš³ãã¬ãŒã ã¯ãŒã¯ãGPUã§ã®å¹ççãªãã¬ãŒãã³ã°ãšæšè«ã®ããã«èšèšãããŠããŸãã
æ©æ¢°ç¿»èš³ã«ããã課é¡
è¿å¹Žå€§ããªé²æ©ãéããŸããããæ©æ¢°ç¿»èš³ã«ã¯ãŸã ããã€ãã®èª²é¡ããããŸãã
- ãããŸããïŒèªç¶èšèªã¯æ¬è³ªçã«ãããŸãã§ããåèªã«ã¯è€æ°ã®æå³ããããæã¯ããŸããŸãªæ¹æ³ã§è§£éã§ããŸããããã«ãããMTã·ã¹ãã ãããã¹ããæ£ç¢ºã«ç¿»èš³ããããšãé£ãããªãå¯èœæ§ããããŸãã
- æ £çšå¥ãšæ¯å©è¡šçŸïŒæ £çšå¥ãšæ¯å©è¡šçŸïŒäŸïŒæ¯å©ãçŽå©ïŒã¯ãMTã·ã¹ãã ã«ãšã£ãŠåŠçãé£ããå ŽåããããŸãããããã®è¡šçŸã¯ãå€ãã®å Žåãåã ã®åèªã®æåéãã®æå³ãšã¯ç°ãªãæå³ãæã£ãŠããŸãã
- ãªãœãŒã¹ã®å°ãªãèšèªïŒMTã·ã¹ãã ã¯ã广çã«ãã¬ãŒãã³ã°ããããã«ãéåžžã倧éã®äžŠåããã¹ãããŒã¿ãå¿ èŠã§ãããã ãããã®ãããªããŒã¿ã¯ããªãœãŒã¹ã®å°ãªãèšèªã§ã¯äžè¶³ããŠããããšããããããŸãã
- ãã¡ã€ã³é©å¿ïŒãããã¡ã€ã³ïŒäŸïŒãã¥ãŒã¹èšäºïŒã§ãã¬ãŒãã³ã°ãããMTã·ã¹ãã ã¯ãå¥ã®ãã¡ã€ã³ïŒäŸïŒå»çããã¹ãïŒã§ã¯ããŸãæ©èœããªãå ŽåããããŸããæ°ãããã¡ã€ã³ãžã®MTã·ã¹ãã ã®é©å¿ã¯ãç¶ç¶çãªç 究課é¡ã§ãã
- å«ççèæ ®äºé ïŒMTã·ã¹ãã ã¯ããã¬ãŒãã³ã°ããŒã¿ã«ååšãããã€ã¢ã¹ãæ°žç¶çã«ç¹°ãè¿ãå¯èœæ§ããããŸããMTã·ã¹ãã ãå ¬å¹³ã§å ¬å¹³ã§ããããšãä¿èšŒããããã«ããããã®ãã€ã¢ã¹ã«å¯ŸåŠããããšãéèŠã§ããããšãã°ããã¬ãŒãã³ã°ããŒã¿ã»ãããç¹å®ã®è·æ¥ãç¹å®ã®æ§å¥ãšé¢é£ä»ããŠããå ŽåãMTã·ã¹ãã ã¯ãããã®ã¹ãã¬ãªã¿ã€ãã匷åããå¯èœæ§ããããŸãã
æ©æ¢°ç¿»èš³ã®ä»åŸã®æ¹åæ§
æ©æ¢°ç¿»èš³ã®åéã¯åžžã«é²åããŠããŸããä»åŸã®äž»ãªæ¹åæ§ã®ããã€ãã以äžã«ç€ºããŸãã
- TransformerããŒã¹ã®ã¢ãã«ïŒBERTãBARTãT5ãªã©ã®TransformerããŒã¹ã®ã¢ãã«ã¯ãæ©æ¢°ç¿»èš³ãå«ãå¹ åºãNLPã¿ã¹ã¯ã§æå 端ã®çµæãéæããŠããŸãããããã®ã¢ãã«ã¯ãã¢ãã³ã·ã§ã³ã¡ã«ããºã ã«åºã¥ããŠãããRNNãããæäžã®åèªéã®é·è·é¢äŸåé¢ä¿ããã广çã«ãã£ããã£ã§ããŸãã
- ãŒãã·ã§ãã翻蚳ïŒãŒãã·ã§ãã翻蚳ã¯ã䞊åããã¹ãããŒã¿ãå©çšã§ããªãèšèªéã§ç¿»èš³ããããšãç®çãšããŠããŸããããã¯éåžžãäžé£ã®èšèªã§å€èšèªMTã¢ãã«ããã¬ãŒãã³ã°ãããã¬ãŒãã³ã°äžã«è¡šç€ºãããªãã£ãèšèªéã§ç¿»èš³ããããšã«ãã£ãŠå®çŸãããŸãã
- å€èšèªæ©æ¢°ç¿»èš³ïŒå€èšèªMTã¢ãã«ã¯ãè€æ°ã®èšèªããã®ããŒã¿ã§ãã¬ãŒãã³ã°ãããããŒã¿ã»ããå ã®ä»»æã®èšèªãã¢éã§ç¿»èš³ã§ããŸããããã¯ãèšèªãã¢ããšã«åå¥ã®ã¢ãã«ããã¬ãŒãã³ã°ãããããå¹ççã§ãã
- ãªãœãŒã¹ã®å°ãªãç¿»èš³ã®æ¹åïŒç ç©¶è ã¯ãåæããŒã¿ã®äœ¿çšã転移åŠç¿ãæåž«ãªãåŠç¿ãªã©ããªãœãŒã¹ã®å°ãªãèšèªã®MTã·ã¹ãã ã®ããã©ãŒãã³ã¹ãåäžãããããã®ããŸããŸãªææ³ã暡玢ããŠããŸãã
- ã³ã³ããã¹ãã®çµã¿èŸŒã¿ïŒMTã·ã¹ãã ã¯ã翻蚳粟床ãåäžãããããã«ãæã衚瀺ãããããã¥ã¡ã³ããäŒè©±ãªã©ãã³ã³ããã¹ãæ å ±ããŸããŸãçµã¿èŸŒãã§ããŸãã
- 説æå¯èœãªæ©æ¢°ç¿»èš³ïŒMTã·ã¹ãã ãããªãç¹å®ã®ç¿»èš³ãçæããã®ãããŠãŒã¶ãŒãçè§£ã§ããããã«ãMTã·ã¹ãã ã®èª¬æå¯èœæ§ãé«ããããã®ç ç©¶ãè¡ãããŠããŸããããã¯ãMTã·ã¹ãã ãžã®ä¿¡é Œãç¯ããæœåšçãªãšã©ãŒãç¹å®ããã®ã«åœ¹ç«ã¡ãŸãã
æ©æ¢°ç¿»èš³ã®çŸå®äžçã®ã¢ããªã±ãŒã·ã§ã³
æ©æ¢°ç¿»èš³ã¯ã次ã®ãããªå¹ åºãçŸå®äžçã®ã¢ããªã±ãŒã·ã§ã³ã§äœ¿çšãããŠããŸãã
- ã°ããŒãã«ããžãã¹ã³ãã¥ãã±ãŒã·ã§ã³ïŒäŒæ¥ãããŸããŸãªèšèªã§é¡§å®¢ãããŒãããŒãåŸæ¥å¡ãšã³ãã¥ãã±ãŒã·ã§ã³ã§ããããã«ããŸããããšãã°ãå€åœç±äŒæ¥ã¯ãMTã䜿çšããŠãé»åã¡ãŒã«ãããã¥ã¡ã³ããããã³Webãµã€ãã翻蚳ããå ŽåããããŸãã
- åœéæ è¡ïŒæ è¡è ãå€åœèªãçè§£ããèŠæ £ããªãç°å¢ãããã²ãŒãããã®ãæ¯æŽããŸããMTã¢ããªã¯ãæšèãã¡ãã¥ãŒãäŒè©±ã翻蚳ããããã«äœ¿çšã§ããŸãã
- ã³ã³ãã³ãããŒã«ãªãŒãŒã·ã§ã³ïŒããŸããŸãªèšèªãšæåã«ã³ã³ãã³ããé©å¿ãããŸããããã«ã¯ãWebãµã€ãããœãããŠã§ã¢ãããŒã±ãã£ã³ã°è³æã®ç¿»èš³ãå«ãŸããŸããããšãã°ããããªã²ãŒã éçºè ã¯ãããŸããŸãªå°ååãã«ã²ãŒã ãããŒã«ã©ã€ãºããããã«MTã䜿çšããå ŽåããããŸãã
- æ å ±ãžã®ã¢ã¯ã»ã¹ïŒããŸããŸãªèšèªã§ã®æ å ±ãžã®ã¢ã¯ã»ã¹ãæäŸããŸããMTã¯ããã¥ãŒã¹èšäºãç ç©¶è«æããã®ä»ã®ãªã³ã©ã€ã³ã³ã³ãã³ãã翻蚳ããããã«äœ¿çšã§ããŸãã
- Eã³ããŒã¹ïŒè£œåã®èª¬æã顧客ã¬ãã¥ãŒããµããŒãè³æã翻蚳ããããšã«ãããåœå¢ãè¶ããeã³ããŒã¹ãä¿é²ããŸãã
- æè²ïŒèšèªåŠç¿ãšç°æåçè§£ããµããŒãããŸããMTã¯ãæç§æžãæè²è³æããªã³ã©ã€ã³ã³ãŒã¹ã翻蚳ããããã«äœ¿çšã§ããŸãã
- æ¿åºãšå€äº€ïŒæ¿åºæ©é¢ãå€äº€å®ãå€åœæ¿åºãçµç¹ãšã®ã³ãã¥ãã±ãŒã·ã§ã³ãæ¯æŽããŸãã
çµè«
æ©æ¢°ç¿»èš³ã¯ãã·ãŒã±ã³ã¹ã»ããŒã»ã·ãŒã±ã³ã¹ã¢ãã«ãšã¢ãã³ã·ã§ã³ã¡ã«ããºã ã®éçºã«ãããè¿å¹Žå€§ããªé²æ©ãéããŸãããPythonã¯ããã®è±å¯ãªã©ã€ãã©ãªãšãã¬ãŒã ã¯ãŒã¯ã®ãšã³ã·ã¹ãã ã«ããã匷åãªMTã·ã¹ãã ãæ§ç¯ããããã®åªãããã©ãããã©ãŒã ãæäŸããŸãã課é¡ã¯æ®ã£ãŠããŸãããç¶ç¶çãªç ç©¶éçºã¯ãå°æ¥ãããã«æ£ç¢ºã§çšéã®åºãMTã·ã¹ãã ãžã®éãåãéããŠããŸããMTãã¯ãããžãŒãé²åãç¶ããã«ã€ããŠãèšèªã®å£ãæã¡ç Žããã°ããŒãã«ãªã³ãã¥ãã±ãŒã·ã§ã³ãšçè§£ãä¿é²ããäžã§ããŸããŸãéèŠãªåœ¹å²ãæããã§ãããã
ç ç©¶è ãéçºè ããŸãã¯åã«æ©æ¢°ç¿»èš³ã®åã«èå³ã®ãã人ã§ããããšãPythonããŒã¹ã®seq2seqã¢ãã«ã®æ¢æ±ã¯ããããã®ããåãçµã¿ã§ãããã®ããã°èšäºã§èª¬æããç¥èãšããŒã«ã䜿çšãããšãäžçäžã®äººã ãã€ãªãæ©æ¢°ç¿»èš³ã·ã¹ãã ãæ§ç¯ããŠå±éããããã®ç¬èªã®æ ã«ä¹ãåºãããšãã§ããŸãã