Last time, I showed how to effortlessly create transcripts of YouTube lectures using LangChain and OpenAI for just pennies. This time, I want to illustrate how to easily translate any newspaper article from one language to another. Are you interested in reading articles from Le Monde or El Pais, or perhaps from Ansa in Italy but cannot read the foreign language? You can simply read the article using the appropriate python library, download it, save it as text and translate into any language of your choice using the OpenAI model you prefer, as long as you have a key to use OpenAI API.
The code is straightforward and simple, allowing you to quickly broaden your horizon. This is a basic version and you can add error handling or customise the code as needed. I apologise in advance for any errors; feel free to point them out in the comments. The full python code is shown at the end of the article.
See an example of a translation here (I only show part of the longer article, but the full link is shown).
https://www.lemonde.fr/international/article/2024/01/26/en-mer-rouge-les-etats-unis-detruisent-un-missile-houthiste-tire-depuis-le-yemen-vers-un-navire-militaire-americain_6213226_3210.html
Les Etats-Unis ont annoncé, vendredi 26 janvier, avoir détruit un missile balistique tiré « depuis les zones du Yémen contrôlées par les houthistes », alors que ces rebelles proches de l’Iran multiplient les attaques contre la marine marchande en mer Rouge et dans le golfe d’Aden. Le missile antinavire se dirigeait vers l’endroit où se trouvait un destroyer de classe Arleigh-Burke et il a été détruit sans faire « ni blessés, ni dommages », a annoncé le Commandement militaire américain au Moyen-Orient.
Les houthistes ont eux annoncé avoir tiré des « missiles » contre un « pétrolier britannique, le Marlin-Luanda », précisant que le navire, « touché de plein fouet, a pris feu ». Le porte-parole militaire des rebelles, Yahya Saree, a ajouté dans son communiqué que l’attaque a été menée en soutien au peuple palestinien et « en réponse à l’agression britannique et américaine contre notre pays ». Pour l’heure, il n’est pas possible de déterminer s’il s’agit du même incident.
(Translation to English)
The United States announced on Friday, January 26, that they destroyed a ballistic missile fired "from the areas of Yemen controlled by the Houthis," as these rebels close to Iran continue to attack merchant ships in the Red Sea and the Gulf of Aden. The anti-ship missile was heading towards a location where an Arleigh-Burke class destroyer was located, and it was destroyed without causing "any casualties or damage," announced the US Central Command in the Middle East.
The Houthis themselves announced that they had fired "missiles" at a "British tanker, the Marlin-Luanda," stating that the ship, "directly hit, caught fire." The rebels' military spokesperson, Yahya Saree, added in his statement that the attack was carried out in support of the Palestinian people and "in response to British and American aggression against our country." For now, it is not possible to determine if it is the same incident.
The full python code is shown here:
import openai
from newspaper import Article
from newspaper import Config
from pathvalidate import sanitize_filepath
def load_openai_key():
keyfile = "path_to_key" #modify as needed
with open(keyfile, 'r') as f:
key = f.read().strip()
openai.api_key = key
os.environ["OPENAI_API_KEY"] = key
def translate_text_openai(text, source_language, target_language):
chunk_size = 3000 # Desired chunk size for each request
# Splitting the text into chunks
text_chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
translated_chunks = []
for chunk in text_chunks:
prompt = f"Translate and re-read the translation to ensure it makes sense and that it is syntactically and grammatically correct from the following '{source_language}' text to '{target_language}': {chunk}"
messages=[
{"role": "system", "content": "You are a professional translator that translates text professionally and accurately."},
{"role": "user", "content": prompt}
]
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", #modify as needed
messages=messages,
temperature=0.5, )
translated_chunks.append(response['choices'][0]['message']['content'].strip())
# Joining the translated chunks into a single translated text
translated_text = ' '.join(translated_chunks)
return translated_text
def read_text(input_file):
with open(input_file, 'r', encoding='utf-8') as f:
text = f.read()
return text
def save_text(text, output_file):
with open(output_file, 'w', encoding='utf-8') as f:
f.write(text)
def newspaper_text_extraction(article_url):
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
config = Config()
config.browser_user_agent = user_agent
article = Article(article_url, config=config)
article.download()
article.parse()
return article
def main(url, source_language, target_language):
load_openai_key()
article = newspaper_text_extraction(url)
input_file = article.title + ".txt"
input_file = sanitize_filepath(input_file)
text = article.url + "\n\n" + article.text
save_text(text, input_file)
text = read_text(input_file)
translated_text = translate_text_openai(text, source_language, target_language)
save_text(article.url + "\n\n" + translated_text, input_file.split(".")[0] + "_" + target_language + ".txt")
if __name__ == '__main__':
# modify as needed
url="https://www.scientificamerican.com/article/brains-are-not-required-when-it-comes-to-thinking-and-solving-problems-simple-cells-can-do-it/"
# translates from english, en, to italian, it
main(url, "en", "it")