یادگیری دوگان ژرف و کاربردهای آن

نویسندگان

دانشکده مهندسی کامپیوتر، دانشگاه صنعتی شریف، تهران، ایران

چکیده

یادگیری ژرف، که در بسیاری از مسائل هوش مصنوعی به نتایج بسیار خوبی رسیده است، از این واقعیت رنج می‌برد که عملکرد آن به‌شدت به حجم داده‌های برچسب‌دار بستگی دارد. در بسیاری از کاربردهای دنیای واقعی، تعداد نمونه‌های دارای برچسب معمولاً محدود بوده و گردآوری آن نیز پرهزینه است. درحالی‌که اغلب، نمونه‌های بدون برچسب به مقدار کافی موجود است. بنابراین، ارائه‌ی روش‌هایی برای بهره‌برداری مؤثر از نمونه‌های بدون برچسب توجه بسیاری را به خود جلب کرده‌ است. گذشته از این، بسیاری از مسائل هوش مصنوعی در قالب دوگان ظاهر می‌شوند؛ برای نمونه، ترجمه انگلیسی به فارسی در مقابل ترجمه فارسی به انگلیسی و طبقه‌بندی تصویر در مقابل تولید تصویر. در سال‌های اخیر، روش‌های متعددی برای استفاده از همبستگی بین وظایف دوگان ارائه شده است. در این مقاله، به بررسی روش‌های یادگیری دوگان می‌پردازیم، که هدف از آن بهره‌برداری مؤثر از دوگانگی میان دو وظیفه‌ی دوگان در آموزش و یا استنتاج است. یادگیری دوگان را می‌توان به سه سطح مختلف، یعنی دوگانگی در سطح داده‌، در سطح مدل و در سطح استنتاج تقسیم نمود‌. در این مقاله، به روش‌های مختلف برای بهره‌گیری از این ایده‌ها و موفقیت‌های آن‌ها در کاربردهای مختلف، خواهیم پرداخت. همچنین نشان خواهیم داد که چگونه یادگیری دوگان به‌طور مؤثر نیاز به داده‌های دارای برچسب را کاهش می‌دهد.

کلیدواژه‌ها

  • [1] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no.7553, pp. 436, 2015.
  • [2] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, J. Dean, "Google"s neural machine translation system: Bridging the gap between human and machine translation," arXiv preprint arXiv:1609.08144, 2016.
  • [3] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
  • [4] A. Oord, N. Kalchbrenner, and K. Kavukcuoglu. "Pixel recurrent neural networks," arXiv preprint arXiv:1601.06759, 2016.
  • [5] D. Amodei, R. Anubhai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, J. Chen, M. Chrzanowski, A. Coates, G. Diamos, and E. Elsen, "End to end speech recognition in English and Mandarin," 2016.
  • [6] A. Graves, A. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," In Proceedings of the IEEE international conference on acoustics, speech and signal processing, pp. 6645-6649, 2013.
  • [7] A. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "Wavenet: A generative model for raw audio," arXiv preprint arXiv:1609.03499, 2016.
  • [8] D. He, Y. Xia, T. Qin, L. Wang, N. Yu, T. Liu, and W. Ma, "Dual learning for machine translation," In Proceedings of the Advances in Neural Information Processing Systems, pp. 820-828. 2016.
  • [9] Y. Xia, T. Qin, W. Chen, J. Bian, N. Yu, and T. Liu, "Dual supervised learning," In Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3789-3798, 2017.
  • [10] J. Lin, Y. Xia, T. Qin, Z. Chen, and T. Liu, "Conditional image-to-image translation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5524-5532, 2018.
  • [11] Z. Yi, H. Zhang, P. Tan, and M. Gong, "Dualgan: Unsupervised dual learning for image-to-image translation," In Proceedings of the IEEE international conference on computer vision, pp. 2849-2857, 2017.
  • [12] D. Tang, N. Duan, T. Qin, Z. Yan, and M. Zhou, "Question answering and question generation as dual tasks," arXiv preprint arXiv:1706.02027, 2017.
  • [13] Y. Ren, X. Tan, T. Qin, S. Zhao, Z. Zhao, and T. Liu, "Almost Unsupervised Text to Speech and Automatic Speech Recognition," arXiv preprint arXiv:1905.06791, 2019.
  • [14] Y. Xia, X. Tan, F. Tian, T. Qin, N. Yu, and T. Liu, "Model-level dual learning," In Proceedings of the International Conference on Machine Learning, pp. 5383-5392, 2018.
  • [15] Y. Xia, J. Bian, T. Qin, N. Yu, and T. Liu, "Dual Inference for Machine Learning," In Proceedings of the International Joint Conferences on Artificial Intelligence, pp. 3112-3118, 2017.
  • [16] I. Goodfellow, Y. Bengio, and A. Courville, "Deep learning," MIT press, 2016.
  • [17] H. Gan, Z. Li, Y. Fan, and Z. Luo, "Dual learning-based safe semi-supervised learning," IEEE Access, vol. 6, pp. 2615-2621, 2017.
  • [18] Y. Wang, Y. Xia, L. Zhao, J. Bian, T. Qin, G. Liu, and T. Liu, "Dual transfer learning for neural machine translation with marginal distribution regularization," In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, 2018.
  • [19] Y. Zhang, Z. Gan, and L. Carin, "Generating text via adversarial training," In Proceedings of the Advances in Neural Information Processing Systems workshop on Adversarial Training, vol. 21, 2016.
  • [20] J. Zhu, T. Park, P. Isola, and A. A. Efros, "Unpaired image-to-image translation using cycle-consistent adversarial networks," In Proceedings of the IEEE international conference on computer vision, pp. 2223-2232, 2017.
  • [21] P. Isola, J. Zhu, T. Zhou, and A. A. Efros, "Image-to-image translation with conditional adversarial networks," In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134, 2017.
  • [22] A. Irvine, and C. Callison-Burch, "Combining bilingual and comparable corpora for low resource machine translation," In Proceedings of the eighth workshop on statistical machine translation, pp. 262-270, 2013.
  • [23] A. Irvine, and C. Callison-Burch, "End-to-end statistical machine translation with zero or small parallel texts," Natural Language Engineering, vol. 22, no. 4, pp. 517-548, 2016.
  • [24] H. Zheng, Y. Cheng, and Y. Liu, "Maximum Expected Likelihood Estimation for Zero-resource Neural Machine Translation," In Proceedings of the International Joint Conferences on Artificial Intelligence, pp. 4251-4257, 2017.
  • [25] G. Lample, A. Conneau, L. Denoyer, and M. Ranzato, "Unsupervised machine translation using monolingual corpora only," arXiv preprint arXiv:1711.00043, 2017.
  • [26] A. Conneau, G. Lample, M. Ranzato, L. Denoyer, and H. Jégou, "Word translation without parallel data," arXiv preprint arXiv:1710.04087, 2017.
  • [27] D. Dong, H. Wu, W. He, D. Yu, and H. Wang, "Multi-task learning for multiple language translation." In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 1, pp. 1723-1732. 2015.
  • [28] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.
  • [29] B. Liu, "Sentiment analysis: Mining opinions, sentiments, and emotions," Cambridge University Press, 2015.
  • [30] IMDB dataset, http://ai.stanford.edu/amaas/data/sentiment/, accessed in October 2011.
  • [31] Y. Yang, W. Yih, and C. Meek, "Wikiqa: A challenge dataset for open-domain question answering," In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 2013-2018, 2015.
  • [32] T. Nguyen, M. Rosenberg, X. Song, J. Gao, S. Tiwary, R. Majumder, and L. Deng, "MS MARCO: A Human-Generated MAchine Reading COmprehension Dataset," 2016.
  • [33] P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, "Squad: 100,000+ questions for machine comprehension of text," arXiv preprint arXiv:1606.05250, 2016.
  • [34] H. Xiao, F. Wang, J. Yan and J. Zheng, "Dual ask-answer network for machine reading comprehension," arXiv preprint arXiv:1809.01997, 2018.
  • [35] J. Johnson, B. Hariharan, L. Maaten, L. Fei-Fei, C. L. Zitnick, and R. Girshick, "Clevr: A diagnostic dataset for compositional language and elementary visual reasoning," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2901-2910, 2017.
  • [36] Y. Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh, "Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6904-6913, 2017.
  • [37] Y. Li, N. Duan, B. Zhou, X. Chu, W. Ouyang, X. Wang, and M. Zhou, "Visual question generation as dual task of visual question answering," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6116-6124, 2018.
  • [38] J. Song, J. Zhang, L. Gao, X. Liu, and H. T. Shen, "Dual Conditional GANs for Face Aging and Rejuvenation," In Proceedings of the International Joint Conferences on Artificial Intelligence, pp. 899-905, 2018.
  • [39] W. Shen, and R. Liu, "Learning residual images for face attribute manipulation," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4030-4038, 2017.
  • [40] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. "Attention is all you need." In Proceedings of the Advances in neural information processing systems, pp. 5998-6008. 2017.
  • [41] K. Ito, The LJspeech dataset. https://keithito.com/LJ-Speech-Dataset/, accessed in November 2017.
دوره 17، شماره 2
پاییز و زمستان
آذر 1398