çŸä»£ã®æ©æ¢°åŠç¿ãšãã£ãŒãã©ãŒãã³ã°ã®ç€ã§ãããåŸé éäžæ³ã®åäºçš®ã®é²åãšå®çšçãªå¿çšã«ã€ããŠè§£èª¬ããŸãã
æé©åã®ç¿åŸïŒåŸé éäžæ³ã®åäºçš®ã«é¢ãã詳现ãªèå¯
æ©æ¢°åŠç¿ãšãã£ãŒãã©ãŒãã³ã°ã®é åã§ã¯ãè€éãªã¢ãã«ã广çã«åŠç¿ãããèœåã¯ã匷åãªæé©åã¢ã«ãŽãªãºã ã«ããã£ãŠããŸãããããã®æè¡ã®å€ãã®äžæ žã«ããã®ãåŸé éäžæ³ã§ããããã¯ã颿°ã®æå°å€ãèŠã€ããããã®åºæ¬çãªå埩ã¢ãããŒãã§ããäžæ žãšãªãæŠå¿µã¯ãšã¬ã¬ã³ãã§ããããã®å®çšçãªå¿çšã¯ããããããç¹å®ã®èª²é¡ã«å¯ŸåŠããåŠç¿ããã»ã¹ãå éãããããã«èšèšãããäžé£ã®æŽç·Žãããäºçš®ããæ©æµãåããããšããããããŸãããã®å æ¬çãªã¬ã€ãã§ã¯ãæãèåãªåŸé éäžæ³ã®äºçš®ãæ·±ãæãäžãããã®ä»çµã¿ãå©ç¹ãæ¬ ç¹ããããŠäžççãªå¿çšã«ã€ããŠæ¢ããŸãã
åºç€ïŒåŸé éäžæ³ã®çè§£
ãã®é«åºŠãªåœ¢åŒãåæããåã«ãåŸé éäžæ³ã®åºæ¬ãææ¡ããããšãéèŠã§ããé§ã«èŠãããå±±ã®é äžã«ããŠãæãäœãå°ç¹ïŒè°·ïŒã«å°éããããšæ³åããŠã¿ãŠãã ããã颚æ¯å šäœã¯èŠãããèªåã®åšãã®å³æã®åŸæããèŠããŸãããåŸé éäžæ³ãåæ§ã«æ©èœããŸããæå€±é¢æ°ã®åŸé ãšã¯éã®æ¹åã«ã¢ãã«ã®ãã©ã¡ãŒã¿ïŒéã¿ãšãã€ã¢ã¹ïŒãå埩çã«èª¿æŽããŸããåŸé ã¯æãæ¥ãªäžæã®æ¹åã瀺ãããããã®éæ¹åã«é²ããšæå€±ãæžå°ããŸãã
æšæºçãªåŸé éäžæ³ïŒãããåŸé éäžæ³ãšãåŒã°ããŸãïŒã®æŽæ°ã«ãŒã«ã¯æ¬¡ã®ãšããã§ãïŒ
w = w - learning_rate * âJ(w)
ããã§ïŒ
wã¯ã¢ãã«ã®ãã©ã¡ãŒã¿ã衚ããŸããlearning_rateã¯ãã¹ãããã®ãµã€ãºãå¶åŸ¡ãããã€ããŒãã©ã¡ãŒã¿ã§ããâJ(w)ã¯ããã©ã¡ãŒã¿wã«é¢ããæå€±é¢æ°Jã®åŸé ã§ãã
ãããåŸé éäžæ³ã®äž»ãªç¹åŸŽïŒ
- é·æïŒåžé¢æ°ã«ã€ããŠã¯å€§åçæå°å€ãžãéåžé¢æ°ã«ã€ããŠã¯å±æçæå°å€ãžã®åæãä¿èšŒããŸããå®å®ããåæçµè·¯ãæäŸããŸãã
- çæïŒç¹ã«å€§èŠæš¡ãªããŒã¿ã»ããã§ã¯èšç®ã³ã¹ããéåžžã«é«ããªãå¯èœæ§ããããåå埩ã§ãã¬ãŒãã³ã°ã»ããå šäœã«ããã£ãŠåŸé ãèšç®ããå¿ èŠããããŸããããã«ãããçŸä»£ã®ãã£ãŒãã©ãŒãã³ã°ã§ããèŠããã巚倧ãªããŒã¿ã»ããã«ã¯éçŸå®çã§ãã
ã¹ã±ãŒã©ããªãã£ã®èª²é¡ãžã®å¯ŸåŠïŒç¢ºççåŸé éäžæ³ïŒSGDïŒ
ãããåŸé éäžæ³ã®èšç®äžã®è² æ ã¯ã確ççåŸé éäžæ³ïŒSGDïŒã®éçºã«ã€ãªãããŸãããããŒã¿ã»ããå šäœã䜿çšãã代ããã«ãSGDã¯åã¹ãããã§ã©ã³ãã ã«éžæãããåäžã®ãã¬ãŒãã³ã°äŸããèšç®ãããåŸé ã䜿çšããŠãã©ã¡ãŒã¿ãæŽæ°ããŸãã
SGDã®æŽæ°ã«ãŒã«ã¯æ¬¡ã®ãšããã§ãïŒ
w = w - learning_rate * âJ(w; x^(i); y^(i))
ããã§(x^(i), y^(i))ã¯åäžã®ãã¬ãŒãã³ã°äŸã§ãã
SGDã®äž»ãªç¹åŸŽïŒ
- é·æïŒç¹ã«å€§èŠæš¡ãªããŒã¿ã»ããã«å¯ŸããŠããããåŸé éäžæ³ãããå€§å¹ ã«é«éã§ããåã ã®äŸã䜿çšããããšã«ãã£ãŠå°å ¥ããããã€ãºã¯ãæµ ã屿çæå°å€ããè±åºããã®ã«åœ¹ç«ã¡ãŸãã
- çæïŒæŽæ°ãã¯ããã«ãã€ãºãå€ããããäžå®å®ãªåæçµè·¯ã«ã€ãªãããŸããåŠç¿ããã»ã¹ã¯æå°å€ã®åšãã§æ¯åããå¯èœæ§ããããŸãããã®æ¯åã®ãããæ£ç¢ºãªæå°å€ã«åæããªãå ŽåããããŸãã
äžççãªå¿çšäŸïŒãã€ããã®ã¹ã¿ãŒãã¢ããã蟲æ¥ã¢ããã€ã¹çšã®ã¢ãã€ã«ã¢ããªã±ãŒã·ã§ã³ãéçºããéã«ãSGDã䜿çšããŠããŠãŒã¶ãŒãã¢ããããŒãããåçããäœç©ã®ç æ°ãèå¥ããè€éãªç»åèªèã¢ãã«ããã¬ãŒãã³ã°ã§ããŸããäžçäžã®ãŠãŒã¶ãŒãæ®åœ±ãã倧éã®ç»åãæ±ãã«ã¯ãSGDã®ãããªã¹ã±ãŒã©ãã«ãªæé©åã¢ãããŒããå¿ èŠã§ãã
åŠ¥åæ¡ïŒãããããåŸé éäžæ³
ãããããåŸé éäžæ³ã¯ããããåŸé éäžæ³ãšSGDã®éã®ãã©ã³ã¹ãåããŸããããã¯ãããããããšããŠç¥ããããã¬ãŒãã³ã°ããŒã¿ã®å°ããªã©ã³ãã ãªãµãã»ããããèšç®ãããåŸé ã䜿çšããŠãã©ã¡ãŒã¿ãæŽæ°ããŸãã
ãããããåŸé éäžæ³ã®æŽæ°ã«ãŒã«ã¯æ¬¡ã®ãšããã§ãïŒ
w = w - learning_rate * âJ(w; x^(i:i+m); y^(i:i+m))
ããã§x^(i:i+m)ãšy^(i:i+m)ã¯ãµã€ãºmã®ãããããã衚ããŸãã
ãããããåŸé éäžæ³ã®äž»ãªç¹åŸŽïŒ
- é·æïŒèšç®å¹çãšåæã®å®å®æ§ã®éã§è¯ã劥åç¹ãæäŸããŸããSGDãšæ¯èŒããŠæŽæ°ã®åæ£ãæžãããããæ»ãããªåæã«ã€ãªãããŸãã䞊ååãå¯èœã«ããèšç®ãé«éåããŸãã
- çæïŒè¿œå ã®ãã€ããŒãã©ã¡ãŒã¿ã§ããããããããµã€ãºãå°å ¥ããŸãã
äžççãªå¿çšäŸïŒãµã³ããŠãããœãŠã«ãã¹ããã¯ãã«ã ãªã©ã®å€æ§ãªåžå Žã§äºæ¥ãå±éããã°ããŒãã«ãªeã³ããŒã¹ãã©ãããã©ãŒã ã¯ããããããåŸé éäžæ³ã䜿çšããŠæšèŠãšã³ãžã³ããã¬ãŒãã³ã°ã§ããŸããå®å®ããåæãç¶æããªããäœçŸäžãã®é¡§å®¢ã€ã³ã¿ã©ã¯ã·ã§ã³ãå¹ççã«åŠçããããšã¯ãç°ãªãæåçå奜ã«ããã£ãŠããŒãœãã©ã€ãºãããææ¡ãæäŸããããã«äžå¯æ¬ ã§ãã
åæã®å éïŒã¢ãŒã¡ã³ã¿ã
æé©åã«ãããäž»èŠãªèª²é¡ã®1ã€ã¯ãè°·ïŒããæ¬¡å ã§ã¯è¡šé¢ãä»ã®æ¬¡å ãããã¯ããã«æ¥ãªé åïŒããã©ããŒãä¹ãè¶ããããšã§ããã¢ãŒã¡ã³ã¿ã ã¯ãéå»ã®åŸé ãèç©ãããé床ãé ãå°å ¥ããããšã§ããã®åé¡ã«å¯ŸåŠããããšãç®æããŸããããã«ãããçŸåšã®åŸé ãå°ãããŠããªããã£ãã€ã¶ãåãæ¹åã«é²ã¿ç¶ããåŸé ãé »ç¹ã«å€åããæ¹åã§ã®æ¯åãæå¶ããã®ã«åœ¹ç«ã¡ãŸãã
ã¢ãŒã¡ã³ã¿ã ãçšããæŽæ°ã«ãŒã«ïŒ
v_t = γ * v_{t-1} + learning_rate * âJ(w_t)
w_{t+1} = w_t - v_t
ããã§ïŒ
v_tã¯ã¿ã€ã ã¹ãããtã§ã®é床ã§ããγïŒã¬ã³ãïŒã¯éåéä¿æ°ã§ãéåžž0.8ãã0.99ã®éã«èšå®ãããŸãã
ã¢ãŒã¡ã³ã¿ã ã®äž»ãªç¹åŸŽïŒ
- é·æïŒç¹ã«äžè²«ããåŸé ãæã€æ¹åã§åæãå éããŸãã屿çæå°å€ãéç¹ãå æããã®ã«åœ¹ç«ã¡ãŸããæšæºã®SGDãšæ¯èŒããŠæ»ãããªè»éãæããŸãã
- çæïŒèª¿æŽãå¿
èŠãªå¥ã®ãã€ããŒãã©ã¡ãŒã¿ïŒ
γïŒã远å ããŸããéåéãé«ããããšæå°å€ãéãéããŠããŸãããšããããŸãã
äžççãªå¿çšäŸïŒãã³ãã³ã®éèæ©é¢ãæ©æ¢°åŠç¿ã䜿çšããŠæ ªåŒåžå Žã®å€åãäºæž¬ããéã«ãã¢ãŒã¡ã³ã¿ã ãæŽ»çšã§ããŸããéèããŒã¿ã«åºæã®ãã©ãã£ãªãã£ãšãã€ãºã®å€ãåŸé ã«ãããæé©ãªååŒæŠç¥ã«åããããéãå®å®ããåæãéæããããã«ã¢ãŒã¡ã³ã¿ã ã¯äžå¯æ¬ ã§ãã
é©å¿çåŠç¿çïŒRMSprop
åŠç¿çã¯éèŠãªãã€ããŒãã©ã¡ãŒã¿ã§ããé«ããããšãªããã£ãã€ã¶ãçºæ£ããå¯èœæ§ããããäœããããšåæãéåžžã«é ããªãå¯èœæ§ããããŸããRMSprop (Root Mean Square Propagation)ã¯ãåãã©ã¡ãŒã¿ã®åŠç¿çãåå¥ã«é©å¿ãããããšã§ãã®åé¡ã«å¯ŸåŠããŸããåŠç¿çãããã®ãã©ã¡ãŒã¿ã®æè¿ã®åŸé ã®å€§ããã®ç§»åå¹³åã§å²ããŸãã
RMSpropã®æŽæ°ã«ãŒã«ïŒ
E[g^2]_t = γ * E[g^2]_{t-1} + (1 - γ) * (âJ(w_t))^2
w_{t+1} = w_t - (learning_rate / sqrt(E[g^2]_t + ε)) * âJ(w_t)
ããã§ïŒ
E[g^2]_tã¯äºä¹åŸé ã®æžè¡°å¹³åã§ããγïŒã¬ã³ãïŒã¯æžè¡°çïŒéåžž0.9ååŸïŒã§ããεïŒã€ãã·ãã³ïŒã¯ãŒãé€ç®ãé²ãããã®å°ããªå®æ°ã§ãïŒäŸïŒ1e-8ïŒã
RMSpropã®äž»ãªç¹åŸŽïŒ
- é·æïŒãã©ã¡ãŒã¿ããšã«åŠç¿çãé©å¿ããããããçãªåŸé ãç°ãªããã©ã¡ãŒã¿ãç°ãªãæŽæ°éãå¿ èŠãšããå Žåã«å¹æçã§ããäžè¬çã«ã¢ãŒã¡ã³ã¿ã ä»ãSGDãããéãåæããŸãã
- çæïŒäŸç¶ãšããŠåæåŠç¿çãšæžè¡°ç
γã®èª¿æŽãå¿ èŠã§ãã
äžççãªå¿çšäŸïŒã·ãªã³ã³ãã¬ãŒã®å€åœç±ãã¯ãããžãŒäŒæ¥ããè€æ°èšèªïŒäŸïŒæšæºäžåœèªãã¹ãã€ã³èªããã©ã³ã¹èªïŒã«ãããææ åæã®ããã®èªç¶èšèªåŠçïŒNLPïŒã¢ãã«ãæ§ç¯ããéã«ãRMSpropã®æ©æµãåããããšãã§ããŸããç°ãªãèšèªæ§é ãšåèªé »åºŠã¯ãåŸé ã®å€§ããã«ã°ãã€ããçããããããšããããŸãããRMSpropã¯ç°ãªãã¢ãã«ãã©ã¡ãŒã¿ã®åŠç¿çãé©å¿ãããããšã§å¹æçã«å¯ŸåŠããŸãã
ãªãŒã«ã©ãŠã³ããŒïŒAdam (Adaptive Moment Estimation)
å€ãã®ãã£ãŒãã©ãŒãã³ã°ã¿ã¹ã¯ã§é Œãã«ãªããªããã£ãã€ã¶ãšããŠãã°ãã°èããããŠããAdamã¯ãã¢ãŒã¡ã³ã¿ã ãšRMSpropã®å©ç¹ãå ŒãåããŠããŸããéå»ã®åŸé ã®ææ°é¢æ°çæžè¡°å¹³åïŒã¢ãŒã¡ã³ã¿ã ã®ããã«ïŒãšãéå»ã®äºä¹åŸé ã®ææ°é¢æ°çæžè¡°å¹³åïŒRMSpropã®ããã«ïŒã®äž¡æ¹ã远跡ããŸãã
Adamã®æŽæ°ã«ãŒã«ïŒ
m_t = β1 * m_{t-1} + (1 - β1) * âJ(w_t)
v_t = β2 * v_{t-1} + (1 - β2) * (âJ(w_t))^2
# Bias correction
m_hat_t = m_t / (1 - β1^t)
v_hat_t = v_t / (1 - β2^t)
# Update parameters
w_{t+1} = w_t - (learning_rate / sqrt(v_hat_t + ε)) * m_hat_t
ããã§ïŒ
m_tã¯äžæ¬¡ã¢ãŒã¡ã³ãæšå®å€ïŒåŸé ã®å¹³åïŒã§ããv_tã¯äºæ¬¡ã¢ãŒã¡ã³ãæšå®å€ïŒåŸé ã®äžå¿åãããŠããªã忣ïŒã§ããβ1ãšÎ²2ã¯ã¢ãŒã¡ã³ãæšå®å€ã®æžè¡°çã§ãïŒããããéåžž0.9ãš0.999ïŒãtã¯çŸåšã®ã¿ã€ã ã¹ãããã§ããεïŒã€ãã·ãã³ïŒã¯æ°å€çå®å®æ§ã®ããã®å°ããªå®æ°ã§ãã
Adamã®äž»ãªç¹åŸŽïŒ
- é·æïŒå€ãã®å Žåãè¿ éã«åæããä»ã®ææ³ãšæ¯èŒããŠãã€ããŒãã©ã¡ãŒã¿ã®èª¿æŽãå°ãªããŠæžã¿ãŸããå€§èŠæš¡ãªããŒã¿ã»ãããšé«æ¬¡å ã®ãã©ã¡ãŒã¿ç©ºéãæã€åé¡ã«é©ããŠããŸããé©å¿çåŠç¿çãšã¢ãŒã¡ã³ã¿ã ã®å©ç¹ãçµã¿åãããŸãã
- çæïŒç¹å®ã®ã·ããªãªã§ã¯ã现ãã調æŽãããã¢ãŒã¡ã³ã¿ã ä»ãSGDãšæ¯èŒããŠãæºæé©ãªè§£ã«åæããããšããããŸãããã€ã¢ã¹è£æ£é ã¯ãç¹ã«åŠç¿ã®åææ®µéã§éèŠã§ãã
äžççãªå¿çšäŸïŒãã«ãªã³ã®ç 究宀ãèªåŸèµ°è¡ã·ã¹ãã ãéçºããéã«ãAdamã䜿çšããŠãäžçäžã§éçšãããè»äž¡ããã®ãªã¢ã«ã¿ã€ã ã»ã³ãµãŒããŒã¿ãåŠçããæŽç·Žããããã¥ãŒã©ã«ãããã¯ãŒã¯ããã¬ãŒãã³ã°ã§ããŸããåé¡ã®è€éã§é«æ¬¡å ãªæ§è³ªãšãå¹ççã§å ç¢ãªãã¬ãŒãã³ã°ã®å¿ èŠæ§ãããAdamã¯åŒ·åãªåè£ãšãªããŸãã
ãã®ä»ã®æ³šç®ãã¹ãäºçš®ãšèæ ®äºé
AdamãRMSpropãã¢ãŒã¡ã³ã¿ã ã¯åºã䜿çšãããŠããŸãããä»ã«ãããã€ãã®äºçš®ãç¬èªã®å©ç¹ãæäŸããŸãïŒ
- Adagrad (Adaptive Gradient): åŠç¿çãéå»ã®ãã¹ãŠã®äºä¹åŸé ã®åèšã§å²ãããšã«ãã£ãŠé©å¿ãããŸããçãªããŒã¿ã«ã¯é©ããŠããŸãããæéãšãšãã«åŠç¿çãç¡éã«å°ãããªããåŠç¿ãææå°æ©ã«åæ¢ããå¯èœæ§ããããŸãã
- Adadelta: Adagradã®åŠç¿çãæžå°ããåé¡ã解決ããããšãç®çãšããAdagradã®æ¡åŒµã§ãRMSpropãšåæ§ã«éå»ã®äºä¹åŸé ã®æžè¡°å¹³åã䜿çšããŸãããéå»ã®æŽæ°ã®æžè¡°å¹³åã«åºã¥ããŠæŽæ°ã¹ããããµã€ãºãé©å¿ãããŸãã
- Nadam: Adamã«ãã¹ãããã¢ãŒã¡ã³ã¿ã ãçµã¿èŸŒãã ãã®ã§ããã°ãã°ãããã«åªããããã©ãŒãã³ã¹ã«ã€ãªãããŸãã
- AdamW: Adamã«ãããéã¿æžè¡°ãšåŸé æŽæ°ã®åé¢ã«å¯ŸåŠããããã«ããæ±åæ§èœãåäžããããšããããŸãã
åŠç¿çã¹ã±ãžã¥ãŒãªã³ã°
éžæãããªããã£ãã€ã¶ã«é¢ä¿ãªããåŠç¿çã¯ãã¬ãŒãã³ã°äžã«èª¿æŽããå¿ èŠããã°ãã°ãããŸããäžè¬çãªæŠç¥ã«ã¯æ¬¡ã®ãã®ããããŸãïŒ
- ã¹ãããæžè¡°ïŒç¹å®ã®ãšããã¯ã§åŠç¿çãããä¿æ°ã§æžå°ãããŸãã
- ææ°é¢æ°çæžè¡°ïŒæéãšãšãã«åŠç¿çãææ°é¢æ°çã«æžå°ãããŸãã
- åšæçåŠç¿çïŒåŠç¿çãäžéãšäžéã®éã§åšæçã«å€åãããŸããããã¯éç¹ããè±åºããããå¹³åŠãªæå°å€ãèŠã€ããã®ã«åœ¹ç«ã¡ãŸãã
é©åãªãªããã£ãã€ã¶ã®éžæ
ãªããã£ãã€ã¶ã®éžæã¯ãã°ãã°çµéšçã§ãããç¹å®ã®åé¡ãããŒã¿ã»ãããã¢ãã«ã¢ãŒããã¯ãã£ã«äŸåããŸããããããããã€ãã®äžè¬çãªã¬ã€ãã©ã€ã³ãååšããŸãïŒ
- Adamããå§ããïŒå€ãã®ãã£ãŒãã©ãŒãã³ã°ã¿ã¹ã¯ã«ãšã£ãŠå ç¢ãªããã©ã«ãã®éžæè¢ã§ãã
- ã¢ãŒã¡ã³ã¿ã ä»ãSGDãæ€èšããïŒAdamãåæã«èŠåŽããããäžå®å®ãªæåã瀺ãããããå Žåãæ³šææ·±ãåŠç¿çã¹ã±ãžã¥ãŒãªã³ã°ãšçµã¿åãããã¢ãŒã¡ã³ã¿ã ä»ãSGDã¯ã匷åãªä»£æ¿ææ®µãšãªãåŸããã°ãã°ããè¯ãæ±åã«ã€ãªãããŸãã
- å®éšããïŒåžžã«æ€èšŒã»ããã§ç°ãªããªããã£ãã€ã¶ãšãã®ãã€ããŒãã©ã¡ãŒã¿ã詊ããæé©ãªæ§æãèŠã€ãåºããŠãã ããã
çµè«ïŒæé©åã®èžè¡ãšç§åŠ
åŸé éäžæ³ãšãã®äºçš®ã¯ãå€ãã®æ©æ¢°åŠç¿ã¢ãã«ã«ãããåŠç¿ãé§åãããšã³ãžã³ã§ããSGDã®åºæ¬çãªåçŽãããAdamã®æŽç·Žãããé©å¿èœåãŸã§ãåã¢ã«ãŽãªãºã ã¯æå€±é¢æ°ã®è€éãªç¶æ³ãããã²ãŒãããããã®ç¬èªã®ã¢ãããŒããæäŸããŸãããããã®ãªããã£ãã€ã¶ã®ãã¥ã¢ã³ã¹ããã®é·æãçæãçè§£ããããšã¯ãäžçèŠæš¡ã§é«æ§èœã§å¹ççããã€ä¿¡é Œæ§ã®é«ãAIã·ã¹ãã ãæ§ç¯ããããšãããã¹ãŠã®å®è·µè ã«ãšã£ãŠäžå¯æ¬ ã§ãããã®åéãé²åãç¶ããã«ã€ããŠãæé©åæè¡ãåæ§ã«é²åãã人工ç¥èœã§å¯èœãªããšã®å¢çãæŒãåºããŠããã§ãããã