Abstract
This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain sub-words in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method.- Anthology ID:
- 2021.wmt-1.87
- Volume:
- Proceedings of the Sixth Conference on Machine Translation
- Month:
- November
- Year:
- 2021
- Address:
- Online
- Editors:
- Loic Barrault, Ondrej Bojar, Fethi Bougares, Rajen Chatterjee, Marta R. Costa-jussa, Christian Federmann, Mark Fishel, Alexander Fraser, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Paco Guzman, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Tom Kocmi, Andre Martins, Makoto Morishita, Christof Monz
- Venue:
- WMT
- SIG:
- SIGMT
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 863–867
- Language:
- URL:
- https://aclanthology.org/2021.wmt-1.87
- DOI:
- Bibkey:
- Cite (ACL):
- Bardia Rafieian and Marta R. Costa-jussa. 2021. High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task. In Proceedings of the Sixth Conference on Machine Translation, pages 863–867, Online. Association for Computational Linguistics.
- Cite (Informal):
- High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task (Rafieian & Costa-jussa, WMT 2021)
- Copy Citation:
- PDF:
- https://aclanthology.org/2021.wmt-1.87.pdf
Export citation
@inproceedings{rafieian-costa-jussa-2021-high, title = "High Frequent In-domain Words Segmentation and Forward Translation for the {WMT}21 Biomedical Task", author = "Rafieian, Bardia and Costa-jussa, Marta R.", editor = "Barrault, Loic and Bojar, Ondrej and Bougares, Fethi and Chatterjee, Rajen and Costa-jussa, Marta R. and Federmann, Christian and Fishel, Mark and Fraser, Alexander and Freitag, Markus and Graham, Yvette and Grundkiewicz, Roman and Guzman, Paco and Haddow, Barry and Huck, Matthias and Yepes, Antonio Jimeno and Koehn, Philipp and Kocmi, Tom and Martins, Andre and Morishita, Makoto and Monz, Christof", booktitle = "Proceedings of the Sixth Conference on Machine Translation", month = nov, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.wmt-1.87", pages = "863--867", abstract = "This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain sub-words in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method.", }
<?xml version="1.0" encoding="UTF-8"?> <modsCollection xmlns="http://www.loc.gov/mods/v3"> <mods ID="rafieian-costa-jussa-2021-high"> <titleInfo> <title>High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task</title> </titleInfo> <name type="personal"> <namePart type="given">Bardia</namePart> <namePart type="family">Rafieian</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marta</namePart> <namePart type="given">R</namePart> <namePart type="family">Costa-jussa</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2021-11</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the Sixth Conference on Machine Translation</title> </titleInfo> <name type="personal"> <namePart type="given">Loic</namePart> <namePart type="family">Barrault</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Ondrej</namePart> <namePart type="family">Bojar</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Fethi</namePart> <namePart type="family">Bougares</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Rajen</namePart> <namePart type="family">Chatterjee</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Marta</namePart> <namePart type="given">R</namePart> <namePart type="family">Costa-jussa</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christian</namePart> <namePart type="family">Federmann</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Mark</namePart> <namePart type="family">Fishel</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Alexander</namePart> <namePart type="family">Fraser</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Markus</namePart> <namePart type="family">Freitag</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yvette</namePart> <namePart type="family">Graham</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Roman</namePart> <namePart type="family">Grundkiewicz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Paco</namePart> <namePart type="family">Guzman</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Barry</namePart> <namePart type="family">Haddow</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Matthias</namePart> <namePart type="family">Huck</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Antonio</namePart> <namePart type="given">Jimeno</namePart> <namePart type="family">Yepes</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Philipp</namePart> <namePart type="family">Koehn</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Tom</namePart> <namePart type="family">Kocmi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Andre</namePart> <namePart type="family">Martins</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Makoto</namePart> <namePart type="family">Morishita</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Christof</namePart> <namePart type="family">Monz</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Online</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain sub-words in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method.</abstract> <identifier type="citekey">rafieian-costa-jussa-2021-high</identifier> <location> <url>https://aclanthology.org/2021.wmt-1.87</url> </location> <part> <date>2021-11</date> <extent unit="page"> <start>863</start> <end>867</end> </extent> </part> </mods> </modsCollection>
%0 Conference Proceedings %T High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task %A Rafieian, Bardia %A Costa-jussa, Marta R. %Y Barrault, Loic %Y Bojar, Ondrej %Y Bougares, Fethi %Y Chatterjee, Rajen %Y Costa-jussa, Marta R. %Y Federmann, Christian %Y Fishel, Mark %Y Fraser, Alexander %Y Freitag, Markus %Y Graham, Yvette %Y Grundkiewicz, Roman %Y Guzman, Paco %Y Haddow, Barry %Y Huck, Matthias %Y Yepes, Antonio Jimeno %Y Koehn, Philipp %Y Kocmi, Tom %Y Martins, Andre %Y Morishita, Makoto %Y Monz, Christof %S Proceedings of the Sixth Conference on Machine Translation %D 2021 %8 November %I Association for Computational Linguistics %C Online %F rafieian-costa-jussa-2021-high %X This paper reports the optimization of using the out-of-domain data in the Biomedical translation task. We firstly optimized our parallel training dataset using the BabelNet in-domain terminology words. Afterward, to increase the training set, we studied the effects of the out-of-domain data on biomedical translation tasks, and we created a mixture of in-domain and out-of-domain training sets and added more in-domain data using forward translation in the English-Spanish task. Finally, with a simple bpe optimization method, we increased the number of in-domain sub-words in our mixed training set and trained the Transformer model on the generated data. Results show improvements using our proposed method. %U https://aclanthology.org/2021.wmt-1.87 %P 863-867
Markdown (Informal)
[High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task](https://aclanthology.org/2021.wmt-1.87) (Rafieian & Costa-jussa, WMT 2021)
- High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task (Rafieian & Costa-jussa, WMT 2021)
ACL
- Bardia Rafieian and Marta R. Costa-jussa. 2021. High Frequent In-domain Words Segmentation and Forward Translation for the WMT21 Biomedical Task. In Proceedings of the Sixth Conference on Machine Translation, pages 863–867, Online. Association for Computational Linguistics.