How to Make a Language Translator – Intro to Deep Learning #11

How to Make a Language Translator – Intro to Deep Learning #11

Articles, Blog , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , 98 Comments

Hello World! It’s Siraj and let’s make our own language translator using TensorFlow Today there are about 6,800 different languages spoken across the world and in an increasingly globalised world nearly every culture has interactions with every other culture in some way that means there are an incalculable number of translation requirements every second of every day across the world Translating is no easy task A language isn’t just a collection of words and of rules of grammar and syntax it’s also a vast inter-connecting system of cultural references and connotations and this reflects a centuries old problem of two cultures wanting to communicate but are blocked by a language barrier our translation systems are fast improving though, so whether it be an idea or a story or a quest each new advancement means one less message will be lost in translation. During the Second World war, the British government was hard at work trying to decrypt the morse coded radio communications that Nazi Germany used to send messages sucurely, known as Enigma. They decided to hire a man named Alan Turing to help in their effort and when the American government learnt of their translation effort they were inspired to try it themselves, post war. Specifically because they needed a way to keep up with Russian scientific publications. The first public demo of a machine translation system, translated 250 words between Russian and English in 1954. It was dictionary based so it would attempt to match the source language to the target language word for word The results were poor since it didn’t capture syntactic structure. The second generation of systems used Interlingua that means they changed a source language to a special intermediary language with specific rules encoded into it then from that generated the target language. This proved to be more efficient, but this approach was soon overshadowed by the rise of statistical translation in the early 90s Primarily from engineers at IBM Innovation at IBM Watch this A popular approach was to break the source text down into segments then compare them to an aligned bi-lingual corpus using statistical evidence and probabilities to choose the most likely translation. Nowadays the most used statistical translation system in the world is Google Translate and with good reason. Google uses deep learning to translate from a given language to another with state of the art results So how do they do this? Let’s recreate their results in TensorFlow to find out The dataset we’ll be using to train our language translation model is a corpus of transcribed TED talks It’s got both the English version and the French version and our goal will be to create a model that can translate from one to the other after training. We’ll be using TensorFlow’s built in data_utils class to help us pre-process our data set and we’ll start by defining our vocab size which is the number of words we want to train on from our dataset. We’ll set it to 40k for each which is a small portion of the data then we’ll use the data_utils class to read the data from the data directory. Giving it our desired vocab size and it will return the formatted and tokenised words in both languages We’ll then initialise TensorFlow placeholders for our encoder and decoder inputs Both will be integer tensors that represent discrete values they will be embedded into a dense representation later We’ll feed our vocabulary word to the encoder and the encoded representation that’s learnt to the decoder. Now we can build our model Google published a paper more recently discussing a system they integrated into their translation service called Neural Machine Translation. It’s an encoder decoder model inspired by similar work from other papers on topics like text summarisation. So whereas as before Google Translate would translate from language A to English to language B with this new NMT architecture, it can translate directly from one language to the other It doesn’t memorise phrase to phrase translations instead it encodes the semantics of the sentence. This encoding is generalised so it can even translate between a language pair like Japanese and Korean that it hasn’t explicitly seen before. So I guess we can use a LSTM recurrent network to encode a sentence in language A the RNN spits out a hidden state ‘s’ which represents the vectorised contents of the sentence. We can then feed ‘s’ to the decoder which will generate the translated sentence in language B, word by word. Sounds easy enough right? WRONG! There is a drawback to this architecture, it has limited memory that hidden state ‘s’ of the LSTM is where we’re trying to cram the whole sentence we want to translate ‘s’ is usually only a few hundred floating point numbers long The more we try to force our sentence into this fixed dimensionality vector the more lossy our neural net is forced to be. We could increase the hidden size of the LSTM after all they’re supposed to remember long term dependencies but what happens is as we increase the hidden size ‘h’ of the LSTM the training time increases exponentially. So to solve this we’re going to bring ‘Attention’ into the mix If I was translating a long sentence, I’d probably glance back at the source sentence a couple times to make sure I was capturing all the details. I’d iteratively pay attention to the relevant parts of the source sentence We can let neural nets do the same by letting them store and refer to previous outputs of the LSTM This increases the storage of our model without changing the functionality of the LSTM So the idea is once we have LSTM outputs from the encoder stored we can query each output asking how relevant they are to the computation happening in the encoder Each encoder output gets a relevancy score which we can convert to a probability score by applying a softmax activation to it. Then we extract a context vector which is a weighted summation to the encoder outputs depending on how relevant they are. Memory ain’t enough, pay attention Memory ain’t enough, pay attention (In Hindi) Memory ain’t enough, pay attention (In German) Memory ain’t enough, pay attention (In Spanish) Memory ain’t enough, pay attention We build our model using TensorFlow’s built in embedding attention sequence to sequence function giving it our encoder and decoder inputs as well as a few hyper parameters we define like the number of layers. It builds a model that is just like the one we discussed TensorFlow has several built in models like this that we can drop into our code easily So normally this alone would be fine and we could run this and the results would be decent but they added another improvement to their model that requires MORE CODE A 100 GPUs and a WEEK OF TRAINING Seriously that’s what it took we won’t implement it all programatically but let’s dive into it conceptually If the outputs don’t have sufficient context then they won’t be able to give a good answer we need to include info about future words, so that the encoder output is determined by the words on the left and the right. We humans would definitely use this kind of full context to determine the meaning of a word we see in a sentence. The way they did this is tho use a bi-directional encoder so it’s two RNNs. One that goes forward over the sentence and the other goes backwards. So for each word it concatenates the vector outputs which produces a vector with context from both sides. and they added a lot of layers to their model. The encoder has one bi-directional RNN layer and seven uni-direciotnal RNN layers The decoder has eight uni-directional RNN layers The more layers the longer the training times so that’s why we use a single bi-directional layer if all the layers were bi-directional the whole layer would have to finish before layer dependencies could start computing But by using uni-directional layers, computation is going to be more parallel. We’ll initialise our TensorFlow section, then our model inside of it Let’s see some results after training. First I’ll give it this phrase Looks good and now another phrase DOPE! While it’s not perfect and we still have a way to go we’re definitely getting closer to having a universal translation model. Breaking it down Encoder-Decoder architectures are for state-of-the-art performance in machine translation by storing the previous outputs of the LSTM cells we can judge the relevancy of each to decide which to use via an attention mechanism. And by using a bi-directional RNN, the context of both past and future words is used to create an accurate encoder output vector. The coding challenge winner from last week is Ryan Lee This was very impressive, he created a recipe summariser by scraping a 125,000 recipes from the web and documented it all beautifully with installation steps so you can reproduce the results yourself. WIZARD OF THE WEEK! and the runner up is Sarah Collins her code converts scientific papers to text and prioritises them by topic. This weeks coding challenge is to create a simple translation system using an encoder-decoder model. All the details are in the readme, post your github link in the comments and I’ll announce the winner next week. Please subscribe for more programming videos Check out this related video and for now I’ve got to get a better GPU So, thanks for watching!

98 thoughts on “How to Make a Language Translator – Intro to Deep Learning #11

  • Gabriel Costanzo Post author

    Can you do a video on a hardware solution? With an arduino, raspberry pi or something like that. Your vids are great, thanks for uploading!

  • Diego Antonio Rosario Palomino Post author

    now that you mention IBM , do they still make any products ? Are they a company that just sits there sucking money out of investors ?

  • chandrashekar D. Post author

    Enigma was used in WW1 not WW2.

  • Lajamerr Mittesdine Post author

    Thank you, Siraj. This probably isn't because of me but I requested this topic in the last videos comments and to my surprise this is the next topic. I'm very happy to see this video.

  • Fred Thomas IV Post author

    great as always

  • chicken Post author

    woah! neat. do you ever plan on doing any videos/tutorials where numpy or other ML libraries aren't used? like, just python and numpy?

  • Amiya Mandal Post author

    read face and play songs according to the mood of that person using machine learning

  • Persist.Resist Post author

    Thankxxx siraj , u r awesome man

  • Amiya Mandal Post author

    or marIO type ai for neural networks game player for counter strike​ using neural networks

  • Bob Smithy Post author

    google translate spits out rubbish Chinese and Japanese

  • Larry Lawrence Post author

    thank you for another tensorflow video

  • Victor Gallagher Post author

    What we now need is and English to English translator.

  • alemmat Post author

    You are the best

  • Oliver Li Post author

    awesome! looking forward to your incorporation of the context, a very important factor. for example,
    In English, "You aren't a student, are you?" (if you are) "Yes."
    In Chinese, "You aren't a student, are you?" (if you are) "No."

    not just the whole article, if you are translating a movie, i think people would be interested in knowing how to label everything and teach the machine to learn them.

    looking forward to that

  • libai tony Post author

    thank your awesome video, siraj.

    Maintain an efficient working learning state, what keys do you think is ?

  • And Phone Post author

    which is best and simple book fr machine learning??

  • Fellipe Cicconi Post author

    puta merda, esse é meu novo canal preferido no youtube. [pt-br ;)]

  • Aryo Pradipta Gema Post author

    how to add attention on top of a non seq2seq lstm? i wanna do a text classification and i think attention might help (mathematically)?

  • Sethu Iyer Post author

    Quality of the video is awesome!

  • Anish Josh Post author

    hi siraj what laptop do you use

  • navya zaveri Post author

    Background music? 😉

  • Oliver Edholm Post author

    Cool video Siraj! What's the music in the beginning?

  • DoSell Post author

    love it!

  • Charles Buckman Post author

    Awesome videos!

  • Cycryl Post author

    In the german is should be "Erinnerung" instead of "Speicher" ;3

  • Doğaç Eldenk Post author

    That background is making me sick. Can you use more static background animations next time ?

  • Leandro Post author


  • Juubes Post author

    Hey! How do I resize images for my neural network?

  • arpit srivastava Post author

    Hey Sir,how can i start deep learning as i am new to this field.

  • Sukumar H Post author

    from tensorflow.models.rnn.translate import data_utils
    I am getting the error that this package does not exist. How to solve this?

  • SammyKake1 Post author

    I wonder how well this could work with translating source code between programming languages.

  • A BHASKAR chary Post author

    Hi Siraj,
    I am a computer science student recently started working on a project on machine learning. I need to make a bot to learn the game "chain reaction" that i have coded in pygame. I'm stuck on how should I implement the bot. Some help would be really appreciated!!
    Thanks in advance..

  • Abasifreke James Post author

    Hi Siraj, you create LIT content! I love it. Keep up the good work. I'm learning so much to hopefully impress my interviewers at Google. 🙂

  • Olf Mombach Post author

    Google Translate still sucks lol…

  • Marcio Fonseca Post author

    Nice Portuguese pronounciation! 🙂

  • Davids AllEyezOnMe Post author

    Portuguese yeahhh

  • Mirko Plitt Post author

    Is this meant to work with a specific version of TensorFlow? I'm on 1.0.1 and it throws errors ("has no attribute 'rnn_cell'") which are apparently related to some undocumented changes between TF versions. I've also been running into other errors but have been able to figure them out — is this part of the challenge? 😉

  • Rafael Costa Post author

    Love your vids! And the rap parts are awesome! Thank you for showing us how easy is ML. For me as an statistician it's a pleasure to see what you create each week!

  • Izumi Koushiro Post author

    Using Ghibli on your thumbnail is playing dirty, how could I not click.

  • Anish Josh Post author

    is this machine good enough to satisfy all our deep learning stuffs

  • Max Ime Post author

    Great work, love your channel! It's all starting to make sense but still wouldn't be able to write a model for a new problem yet. Also love the rapping, really cool 😉

  • kinsley kajiva Post author

    hey can you make an episode on Latent Sentiment Analysis on score essays
    to a numeric value or grade say 90% or 20% in python. There's is little
    content on YouTube that fully describe Latent Sentiment Analysis ,most
    of they just talk about TF-IDF,so i am looking for more really.

  • Jagmohan Singh Post author

    Hi Siraj,

    I am new to machine learning. I have seen bunch of your videos which are very good and interesting. I have one question, from where do I start as a beginner. Should I continue directly from deep learning or clear some my basics first. I have experience in python so that will not be a problem.

  • Shorts Post author

    Dude how do you reply to every comment without dying of severe carpal tunnel syndrome

  • FranksWorldTV Post author

    You make AI meme-a-licious! 😛

  • البراء Post author

    I really appreciate that you're a professional who's willing to share his expertise, even though I'm not interested in this subject. Unfortunately when senior Chemical engineers retire, they leave the plant they worked at and their 30+yrs knowledge goes with them. They fix problems that appear in the plant with properly logging the events, but only in their minds. So when they retire all the professional knowledge is lost.
    What you are doing here is really special, thank you.

  • Jeff Martel Post author

    your explanation speed is way much better ! Thank you !

  • DanMana1 Post author

    Hi Siraj, could you please do a video on bounding box detection? I really like your videos, thanks for all the effort you put into making them

  • James Bhattacharya Post author

    Siraj, you look very much like Sergey Brin!

  • Vadim Borisov Post author

    thank you for using python3 this time!

  • Josip Vukoja Post author

    Siraj you should get a professional mic with a filter! It will put your videos to a whole another level 😉

  • Natthaphong Phuntusil Post author

    Thank you for your great VDO. I'm learning from you. 🙂

  • Erilyth Post author

    Great video Siraj! Here's my submission, . Training these models takes a very long time though, are there any online services that provide free GPU access for students?

  • TheNemzy Post author

    Coding challenge:

  • Wilson Mar Post author

    [2:28] coding begins
    [7:35] execution statements and responses

  • Jakrit Rungsimanop Post author

    Siraj, can you please show the 10,000 hours project? I like to see how I can shorten the learning process in any subject. thank love u 🙂 🙂

  • Randy Ellis Post author

    When I run:
    from tensorflow.models.rnn.translate import data_utils
    I get an ImportError saying "No module named tensorflow.models"
    Any help, please?
    Thanks in advance.

  • mohamed ahmed Post author

    ? can you make video for face recognition

  • TSA1 Post author

    im coded away

  • Vijay Bharti Post author

    Your codes don't usually work as shown, however videos are encouraging.

  • Vick Nad Post author

    Google translate is horrible when translating to and from a minority language compared to a majority language. This is the sad truth!

  • Kedi miow Post author

    How can I discover a foreign language look like russian? do you tell me?

  • Laura Lee Post author

    Good to know that google translate can be trusted tomorrow, I'm going to translate sentences to Italian because there's a girl in my 6th hour that doesn't speak English and I want to talk to her hope it goes well 🤞🏻

  • ebtesam h Post author

    which version of tensorflow did you used

  • La Tortue PGM Post author

    it's "prend-moi dehors", 'cos attrape is like catching a fish, but prend is take, but catching someone in a sexual way as well, so it's more appropriate to the context. 😉

  • Lolpop HD Post author

    where do i have to put that tranlation zip file?

  • Ron Wein Post author

    Great video as always
    Tnx for sharing!!

    Would it be possible to share the weights of your videos? it would be much better to see results that way for the poor who don't use gpu:)

  • SalRite Post author

    Hello Siraj, its World :-p "Great videos, thanks… "The way you teach/share knowledge is way different from anyone. Your excitement makes vids interesting :-p Though Some of the topics needs more details .

  • Jay Ventura Post author

    Can I ask a sample format for the source language or target language?
    Thank you very much in advance!

  • Zak Jay Post author

    Hey Siraj, I am looking for answers to some questions. may I have your email id? or should I ask here?

  • Matheus D. Rodrigues Post author

    I was sent by Daniel Shiffmann (actualy by you on his channel) and You are amazing!

  • ssains Post author

    can any one please give me a link to how to set up the environment for tensor flow, jupyter etc/??

  • Phunker1 Post author

    Speicher ist nicht genug … pass auf…

  • yantons Post author

    I am getting below error when running this. Siraj could you please help."Valueerror: Variable proj_w already exists, disallowed. Did you mean to set reuse = True or reuse =AUTO_REUSE in VarScope"

  • Field Marschal Coramine Post author

    Can you make a video over Gene expression microarray data like ACGT… using DL? Thanks in Advance 🙂

  • Shashank Sharma Post author

    #TranslationServicesInSingapore #TranslationCompaniesInSingapore –

  • Jigyasa Sakhuja Post author

    can you please tell me what is a cross-lingual translator ?

  • sathiyan seenivasagam Post author

    siraj im software engineering student same like try use Language Translator application for my final project im new to python language try to run your application and get the idea but application cant be run can you please teach me how to run this application please contact me hope [email protected]

  • Andi Munandar Mappatunru Post author

    mr.Siraj Raval nice to meet you
    my name is nandar im from indonesia

  • Salma Post author

    greetings from Kuwait! I'm intrested in translation field in general and this video helped alot! thank you very much for sharing this well-executed video. subscribed

  • ankish bansal Post author

    Memory ain't enough, Pay attention…..
    I like it very much. Thanks Siraj for this wonderful video.

  • arek b Post author

    By the way, Marian Rejewski was the first polish mathematician who broke Enigma machine in 1932, before WorldWar 2 and before Alan Turing.

  • YouMobile Pakistan Post author

    Just started with OpenNMT, one simple question: Once trained and deployed, does the neural network grow upon each usage? i.e. if I train OpenNMT then deploy it on a server and make it public, will the machine keep learning unsupervised when people use it?
    Does the knowledge base of the machine grow?

  • Vishal Khatri Post author

    have you done any Hindi to English translator . if yes then give me the code or this same code works in it…thankyou…..reply as soon as possible brother….

  • Immortal Coders Post author

    bro is it possiple to find which is their native country based on the photos through giving collection of photo he is indian and he is britain

  • Mohammad Reyaan Post author

    please guide me . . How i make portable language translator offline device with the help of arduino or pi ?

  • Prasad Katkade Post author

    I am newbie here can anybody tell me how to solve error "no module named data_utils "

  • Ponape4 Post author

    SMT and neural machine translation are great, but they are not useful for minority languages which do not have parallel corpora (Quechua, Wolof, Chamorro…), and that's a pity.

  • Mr WhiteHawk Post author

    Translation between policies and pseudo code instructions will create an enormous shift in paradigm in ?.

  • Ashley Hunter Post author

    I thought I might be a clever person. I watched this video. I am not a clever person. The end.

  • King15kunal Post author

    Hey Siraj could you please make a video on NMT (Neural machine translation), which is one of the advanced machine translation methods.

  • Shocko Post author

    Out of date

  • AhmadZia yosfi Post author

    2:22 PASHTO 😉

  • Badr Otaibi Post author

    u r Great

  • Uttam Dwivedi Post author

    Hi, I am using this tutorial to create a webpage, where anyone can upload the document (pdf, docx) to translate the file. Can anyone help me on how to get the pdf document in python and extract the texts to translate it?

  • Organic Tech Post author

  • Daydreaming Engeniering Post author

    I am not in this level of education…but i was plotting on making a deep learning ai system you can carry on with mic for listening and small form monitor you can carry on and it translates in real time words spoken in any language. Again, i know nothing about this stuff but it seems like its well on its way…reguardless….if the system is made maybe it can be incorporated into a dfesign like the idea i got….maybe the lack of knolege comes from the lack of having that type of hardware…i love learning things

  • Nur Aisyah Post author

    Hi. did anyone know how's to create a transliteration machine learning that can solved homograph disambiguation using python?

Leave a Reply

Your email address will not be published. Required fields are marked *