بهبود ترجمه ماشینی مبتنی بر قاعده با استفاده از قواعد نحوی آماری

نویسندگان

دانشکده مهندسی ﺑﺮق وﻛﺎﻣﭙﻴﻮﺗﺮ،دانشگاه ﺗﻬﺮان،ﺗﻬﺮان، اﻳﺮان

چکیده

ترجمه ماشینی مبتنی بر قاعده از مجموعه‌ای از قواعد که دربردارنده اطلاعات زبانی هستند در فرایند ترجمه استفاده می‌کند. نتایج تولید شده توسط این مترجم‌ها معمولاً از نظر دستور زبان و ترتیب کلمات بهتر از نتایج مترجم‌های آماری هستند. ولی تحقیقات نشان داده است که این ترجمه‌ها از نظر روانی و انتخاب کلمات مناسب، ضعیف‌تر از مترجم‌های آماری هستند. در این مقاله هدف، بهبود انتخاب لغات در مترجم مبتنی بر قاعده است. این کار با استفاده از مجموعه‌ای از قواعد نحوی- لغوی مبتنی بر گرامر درخت- پیوندی (TAG) انجام می‌شود. این قواعد احتمالاتی به‌صورت آماری از یک پیکره موازی با اندازه بزرگ استخراج شده‌اند. در سیستم ارائه شده، کلمات با ترتیب پیشنهادی مترجم مبتنی بر قاعده در زبان مقصد قرار می‌گیرند و به همین دلیل در ترجمه جملات از یک رمزگشای یکنواخت مبتنی بر برنامه‌ریزی پویا استفاده شده است. در این سیستم بهترین ترجمه با استناد به احتمال قواعد استفاده شده و امتیاز مدل زبانی انتخاب می‌شود. آزمایش‌ها روی ترجمه انگلیسی به فارسی نشان داد که کیفیت نتایج به‌دست ‌آمده از روش پیشنهادی حدود 3/1+ واحد بلو از کیفیت ترجمه به‌دست ‌آمده توسط مبتنی بر قاعده پایه بالاتر است.

کلیدواژه‌ها

  • [1] M. R. Costa-jussàa, M. Farr´us, J. B. Mari˜no, and J. A.R. Fonollosa, "Study And Comparison of Rule-Based And Statistical Catalan-Spanish Machine Translation Systems," Computing and Informatics, vol. 31, no. 2, pp. 245-270, 2012.
  • [2] A. K. Joshi, L. S. Levy, and M. Takahashi, "Tree Adjunct Grammars," Journal of Computer and System Sciences, vol. 10, no. 1, pp. 136–163, 1975.
  • [3]:, Online]. Available] فرازین: مترجم خودکار متون انگلیسی به فارسی www.faraazin.ir.
  • [4] M. R. Costa-jussàa, and J. A. R. Fonollosa, "Latest trends in hybrid machine translation and its applications," Computer Speech & Language, vol. 32, no. 1, pp. 3-10, July 2015.
  • [5] A. Bisazza, and M. Federico, "A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena," Computational Linguistics, vol. 42, no. 2, pp. 163–205, 2016.
  • [6] M. Collins, P. Koehn, and I. Kucerova, "Clause Restructuring for Statistical Machine Translation," In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 531–540, 2005.
  • [7] R. N. Patel, R. Gupta, P. B. Pimpale, and M. Sasikumar, "Reordering rules for English-Hindi SMT," In Proceedings of the 2nd Workshop on Hybrid Approaches to Translation (HyTra), pp. 34-41, 2013.
  • [8] F. Xia, and M. McCord, "Improving a Statistical MT System with Automatically Learned Rewrite Patterns," In Proceedings of the 20th international conference on Computational Linguistics, pp. 508, 2004.
  • [9] A. Mansouri, H. Fadaei, H. Faili, and M. Arabsorkhi, "Using Synchronous TAG for Source-Side Reordering in SMT," International Journal of Information & Communication Technology Research, vol. 5, no. 4, pp. 47-58, Autumn 2013.
  • [10] A. Eisele, C. Federmann, H. Saint-Amand, M. Jellinghaus, T. Herrmann, and Y. Chen. "Using Moses to integrate multiple rule-based machine translation engines into a hybrid system," In Proceedings of the 3rd Workshop on Statistical Machine Translation (WMT), pp. 179–182, 2008.
  • [11] A. Ahsan, P. Kolachina, S. Kolachina, D. Misra Sharma, and R. Sangal, "Coupling statistical machine translation with rule-based transfer and generation," In Proceedings of the 9th Conference of the Association for Machine Translation in the Americas. 2010.
  • [12] V. M. S´anchez-Cartagena, J. A. P´erez-Ortiz, and F. S´anchez-Mart´ınez, "Integrating Rules and Dictionaries from Shallow-Transfer Machine Translation into PhraseBased Statistical Machine Translation," Journal of Artificial Intelligence Research, vol. 55, pp. 17-61, 2016.
  • [13] W. Ma, and K. McKeown, "Detecting and Correcting Syntactic Errors in Machine Translation Using Featurebased Lexicalized Tree Adjoining Grammars," Computational Linguistics and Chinese Language Processing, vol. 17, no. 4, pp. 1-14, December 2012.
  • [14] A. L. Lagarda, V. Alabau, F. Casacuberta, R. Silva, and E. Diaz-de Liano, "Statistical post-editing of a rule-based machine translation system," In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, pp. 217-220, January 2009.
  • [15] A. Göhring, "Building a Spanish-German dictionary for hybrid MT," The 3rd Workshop on Hybrid Approaches to MachineTranslation (HyTra), pp. 30–35, April 2014.
  • [16] A. Antonova, and A. Misyurev, "Improving the precision of automatically constructed human-oriented translation dictionaries," In Proceedings of the 3rd Workshop on Hybrid Approaches to Ma chine Translation (HyTra), pp. 58–66, April 2014.
  • [17] L. Shen, J. Xu, and R. Weischedel, "A New String-todependency Machine Translati on Algorithm with a Target Dependency Language Model," In Proceedings of TheAssociation for Computational Linguistics, pp. 577-585,2008.
  • [18]M. Galley, M. Hopkins, K. Knight, and D. Marcu, "What’s in a Translation Rule,"In Proceedings of TheConference of the North American Chapter of theAssociation for Computational Linguistics: HumanLanguage Technologies (NAACL- HLT), Boston,Massachusetts, USA, pp. 273-280, 2004.
  • [19]L. Huang, K. Knight, and A. Joshi,"Statistical Syntax-directed Translation with Extended Domain of Locality,"InProceedings of AMTA, pp. 66-73, 2006.
  • [20]S. DeNeefe,"Tree-adjoining Machine Translation," PhD Thesis, Faculty of the USC graduate school Universityof Southern California, 2011.
  • [21] S. DeNeefe, K. Knight, W. Wang, and D. Marcu, "What Can Syntax-based MT Learn from Phrase-based MT?," In Proceedings of EMNLP-CoNLL, pp. 755-763, 2007.
  • [22] D. Klein, and Ch. D. Manning, "Accurate Unlexicalized Parsing," In Proceeding of the 40th Annual meeting of the Association for Computational Linguistics, vol.1, pp. 423- 430, 2003.
  • [23] J. Chen, and K. Vijay-Shanker, " Automated Extraction of TAGs from the Penn Treebank," In Proceedings of the Sixth International Workshop on Parsing Technologies, pp. 73-89, 2000.
  • [24] F. J. Och, and H. Ney, "Improved Statistical Alignment Models," In Proceedings ofthe 38th Annual Meeting of theAssociation for Computational Linguistics, pp. 440-447, 2000.
  • [25] Y. Liu, Q. Liu, and Y. Lu, "Adjoining Tree-to-StringTranslation," In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, pp. 1278–1287, 2011.
  • [26] P. Koehn, "Statistical Machine Translation," Cambridge University Press, 2010.
  • [27] M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe,W. Wang and I. Thayer, "Scalable Inference and Training of Context-Rich Syntactic Translation Models," In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pp. 961-968, 2006.
  • [28] F. Jabbari, S. Bakhshaei, S. M. MohammadzadehZiabary, and S. Khadivi, "Developing an Open-domain English-Farsi Translation System Using AFEC: Amirkabir Bilingual Farsi-English Corpus," In Proceedings of the fourth Workshop on Computational Approaches to Arabic Script-based Languages, pp. 17, 2012.
  • [29] K. Papineni, S. Roukos, T. Ward, and W. J. Zhu,"BLEU: a method for automatic evaluation of machine translation," In Proceedings of the 40th Annual meeting of the Association for Computational Linguistics, pp.311–318,2002.
  • [30] P. Koehn, "Statistical Significance Tests for Machine Translation Evaluation," In Proceedings of EMNLP. pp. 388–395, 2004.
دوره 14، شماره 2
پاییز و زمستان
آذر 1395