Langchain Meets GPT-3.5: Crafting the Ultimate Multilingual News Articles Summarizer In English And French
Streamlined Summaries Across Languages with Cutting-Edge Technology"
Introduction
In our modern, rapidly evolving society, staying abreast of current news and updates is crucial. Yet, sifting through numerous articles can be a tedious task. To streamline this process and provide you with succinct insights, we're introducing a News Articles Summarizer built with GPT-3.5 and LangChain. This robust tool allows for efficient scraping of web articles, capturing their headlines, content and producing sharp summaries. In this guide, we'll delve into the step-by-step creation of this summarizer.
Workflow for Building a News Articles Summarizer
Installing required libraries: To get started, ensure you have the necessary libraries installed:
requests
,newspaper3k
, andlangchain
.Scraping articles: Use
requests
the library to scrape the content of the target news articles from their respective URLs.Extracting titles and text: Employ
newspaper
the library to parse the scraped HTML and extract the titles and text of the articles.Preprocessing the text: Clean and preprocess the extracted texts to make them suitable for input to GPT-3.5 model.
Generating summaries: Utilize GPT-3.5 model to summarize the extracted articles
Outputing the results: Present the summaries along with the original titles, allowing users to grasp the main points of each article quickly.
- Installing dependencies
!pip install -q openai langchain newspaper3k python-dotenv requests
Create a .env
file in your project root directory and add your OpenAI environment variable:
from dotenv import load_dotenv
!echo "OPENAI_API_KEY='<OPENAI_API_KEY>'" > .env
load_dotenv()
- Scraping & extracting the title and the text of the article using requests and newspaper libraries
import requests
from newspaper import Article
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}
article_urls = "https://www.wired.com/story/fast-forward-chatgpt-my-new-chatbot-friend-get-things-done/"
session = requests.Session()
try:
response = session.get(article_urls, headers=headers, timeout=10)
if response.status_code == 200:
article = Article(article_urls)
article.download()
article.parse()
print(f"Title: {article.title}")
print(f"Text: {article.text}")
else:
print(f"Failed to fetch article at {article_urls}")
except Exception as e:
print(f"Error occurred while fetching article at {article_urls}: {e}")
output of the above code
- Generating the summaries of the article using gpt-3.5-turbo
The next code imports essential classes and functions from the LangChain and sets up a ChatOpenAI
instance with a temperature of 0 for controlled response generation. Additionally, it imports chat-related message schema classes, which enable the smooth handling of chat-based tasks. The following code will start by setting the prompt and filling it with the article’s content.
from langchain.schema import HumanMessage
from langchain.chat_models import ChatOpenAI
article_title = article.title
template = """You are a very good assistant that summarizes online articles.
Here's the article you want to summarize.
==================
Title: {article_title}
{article_text}
==================
Write a summary of the previous article.
"""
prompt = template.format(article_title=article.title, article_text=article.text)
messages = [HumanMessage(content=prompt)]
chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
summary = chat(messages)
print(summary.content)
output of the code below
If we want a bulleted list, we can modify the prompt as shown below.
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.
Here's the article you need to summarize.
==================
Title: {article_title}
{article_text}
==================
Now, provide a summarized version of the article in a bulleted list format.
"""
prompt = template.format(article_title=article.title, article_text=article.text)
summary = chat([HumanMessage(content=prompt)])
print(summary.content)
The output of the code is shown below
- The author used an open source program called Auto-GPT to find the email address of the CEO of Lindy AI. - Auto-GPT suggested a plan and attempted to find the email address through web searches and checking the CEO's LinkedIn profile. - The program also tried guessing the email address based on commonly used formats and used email verification services to check its guesses. - None of the attempts were successful, but the program saved the addresses for the author to try emailing them. - The author eventually made their own guess and found the correct email address. - The experience with Auto-GPT highlights the potential of AI assistants like ChatGPT to perform a wide range of tasks. - However, there are concerns about the safety and reliability of AI agents when handling riskier tasks. - The CEO of Lindy AI believes that AI agents could replace certain office workers and lead to the disappearance of some professions.
To obtain a summary in French, we can guide the model to produce it in the French language. However, keep in mind that GPT-3's primary training data is in English. Although it possesses multilingual abilities, the output's accuracy might be inconsistent for non-English languages. Here's a way to adjust the prompt.
template = """You are an advanced AI assistant that summarizes online articles into bulleted lists in French.
Here's the article you need to summarize.
==================
Title: {article_title}
{article_text}
==================
Now, provide a summarized version of the article in a bulleted list format, in French.
"""
prompt = template.format(article_title=article.title, article_text=article.text)
summary = chat([HumanMessage(content=prompt)])
print(summary.content)
The output of the code is shown below
- Auto-GPT est un programme open source qui peut aider à trouver des informations en ligne, comme l'adresse e-mail du PDG d'une startup appelée Lindy AI. - Auto-GPT effectue une recherche sur le web pour trouver l'adresse e-mail du PDG de Lindy AI, mais ne parvient pas à la trouver. - Le programme suggère ensuite de deviner l'adresse e-mail en se basant sur des formats couramment utilisés. - Auto-GPT utilise différents services de vérification d'adresses e-mail pour vérifier ses suppositions, mais aucune ne s'avère valide. - Auto-GPT enregistre les adresses dans un fichier sur l'ordinateur de l'utilisateur et suggère d'essayer de les contacter par e-mail. - L'article souligne que les chatbots comme ChatGPT peuvent accomplir une grande variété de tâches sophistiquées grâce à leur capacité à répondre à de nombreuses questions. - Cependant, il est important de développer des agents intelligents qui soient également sûrs pour éviter les problèmes potentiels. - Le PDG de Lindy AI pense que les agents d'intelligence artificielle pourraient remplacer certains employés de bureau à l'avenir et prédit la disparition de certaines professions.
Behind the scenes, the code first gathers article details like the title and content. A conversational prompt is then crafted, positioning the AI as a sophisticated assistant tasked with summarizing the article in French bullet points. The GPT-3 model is loaded with specific settings to regulate output randomness and the prompt is populated with the article's data. The core part of the process is when we pass the formatted prompt to the model. The model parses the prompt, understands the task and generates a summary accordingly.
Conclusion
To wrap up, we've demystified the journey of crafting a proficient News Article Summarizer through the synergy of LangChain and GPT-3.5. This tool, enhanced by its ability to present AI summaries in bullet points not only distills complex articles for easy consumption but also embraces a global audience by offering translations with French as an example. Besides, the step-by-step guide provided serves as a beacon for those aiming to optimize their news-reading experience ensuring they remain updated without wasting time reading long news articles online.
If you want to contribute or you find any errors in this article please do leave me a comment.
You can reach out to me on any of the matrix decentralized servers. My element messenger ID is @maximilien:matrix.org
If you are in one of the mastodon decentralized servers, here is my ID @maximilien@qoto.org
If you are on linkedIn, you can reach me here
If you want to contact me via email maximilien@maxtekai.tech
If you want to hire me to work on machine learning, data science, IoT and AI-related projects, please reach out to me here
Warm regards,
Maximilien.