Langchain Meets GPT-3.5: Crafting the Ultimate Multilingual News Articles Summarizer In English And French

Langchain Meets GPT-3.5: Crafting the Ultimate Multilingual News Articles Summarizer In English And French

Streamlined Summaries Across Languages with Cutting-Edge Technology"

Introduction

In our modern, rapidly evolving society, staying abreast of current news and updates is crucial. Yet, sifting through numerous articles can be a tedious task. To streamline this process and provide you with succinct insights, we're introducing a News Articles Summarizer built with GPT-3.5 and LangChain. This robust tool allows for efficient scraping of web articles, capturing their headlines, content and producing sharp summaries. In this guide, we'll delve into the step-by-step creation of this summarizer.

Workflow for Building a News Articles Summarizer

  1. Installing required libraries: To get started, ensure you have the necessary libraries installed: requests, newspaper3k, and langchain.

    • Scraping articles: Use requests the library to scrape the content of the target news articles from their respective URLs.

      • Extracting titles and text: Employ newspaper the library to parse the scraped HTML and extract the titles and text of the articles.

      • Preprocessing the text: Clean and preprocess the extracted texts to make them suitable for input to GPT-3.5 model.

  2. Generating summaries: Utilize GPT-3.5 model to summarize the extracted articles

  3. Outputing the results: Present the summaries along with the original titles, allowing users to grasp the main points of each article quickly.


  1. Installing dependencies
!pip install -q openai langchain newspaper3k python-dotenv  requests

Create a .env file in your project root directory and add your OpenAI environment variable:

from dotenv import load_dotenv

!echo "OPENAI_API_KEY='<OPENAI_API_KEY>'" > .env

load_dotenv()
  1. Scraping & extracting the title and the text of the article using requests and newspaper libraries
import requests
from newspaper import Article

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'
}

article_urls = "https://www.wired.com/story/fast-forward-chatgpt-my-new-chatbot-friend-get-things-done/"

session = requests.Session()

try:
    response = session.get(article_urls, headers=headers, timeout=10)

    if response.status_code == 200:
        article = Article(article_urls)
        article.download()
        article.parse()

        print(f"Title: {article.title}")
        print(f"Text: {article.text}")

    else:
        print(f"Failed to fetch article at {article_urls}")
except Exception as e:
    print(f"Error occurred while fetching article at {article_urls}: {e}")
output of the above code
Title: Enough Talk, ChatGPT—My New Chatbot Friend Can Get Things Done Text: I recently needed to contact the CEO of a startup called Lindy, a company developing personal assistants powered by artificial intelligence. Instead of looking for it myself, I turned to an AI helper of my own, an open source program called Auto-GPT, typing in “Find me the email address of the CEO of Lindy AI.” Like a delightfully enthusiastic intern, Auto-GPT began furiously Googling and browsing the web for answers, providing a running commentary designed to explain its actions as it went. “A web search is a good starting point to gather information about the CEO and their email address,” it told me. When given a task like finding a startup CEO's email address, the open source Auto-GPT suggests a plan for approval and can attempt to put it into action. Auto-GPT via Will Knight “I found several sources mentioning Flo Crivello as the CEO of Lindy.ai, but I haven't found their email address yet,” Auto-GPT reported. “I will now check Flo Crivello’s LinkedIn profile for their email address,” it said. That didn’t work either, so the program then suggested it could guess Crivello’s email address based on commonly used formats. After I gave it permission to go ahead, Auto-GPT used a series of different email verification services it found online to check if any of its guesses might be valid. None provided a clear answer, but the program saved the addresses to a file on my computer, suggesting I might want to try emailing them all. Who am I to question a friendly chatbot? I tried them all, but every email bounced back. Eventually, I made my own guess at Crivello’s email address based on past experience, and I got it right the first time. Auto-GPT failed me, but it got close enough to illustrate a coming shift in how we use computers and the web. The ability of bots like ChatGPT to answer an incredible variety of questions means they can correctly describe how to perform a wide range of sophisticated tasks. Connect that with software that can put those descriptions into action and you have an AI helper that can get a lot done. Of course, just as ChatGPT will sometimes produce confused messages, agents built that way will occasionally—or often—go haywire. As I wrote this week, while searching for an email address is relatively low-risk, in the future agents might be tasked with riskier business, like booking flights or contacting people on your behalf. Making agents that are safe as well as smart is a major preoccupation of projects and companies working on this next phase of the ChatGPT era. When I finally spoke to Crivello of Lindy, he seemed utterly convinced that AI agents will be able to wholly replace some office workers, such as executive assistants. He envisions many professions simply disappearing.
  1. Generating the summaries of the article using gpt-3.5-turbo

The next code imports essential classes and functions from the LangChain and sets up a ChatOpenAI instance with a temperature of 0 for controlled response generation. Additionally, it imports chat-related message schema classes, which enable the smooth handling of chat-based tasks. The following code will start by setting the prompt and filling it with the article’s content.

from langchain.schema import HumanMessage
from langchain.chat_models import ChatOpenAI

article_title = article.title

template = """You are a very good assistant that summarizes online articles.

Here's the article you want to summarize.

==================
Title: {article_title}

{article_text}
==================

Write a summary of the previous article.
"""

prompt = template.format(article_title=article.title, article_text=article.text)

messages = [HumanMessage(content=prompt)]

chat = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)  

summary = chat(messages)
print(summary.content)
output of the code below
The article discusses the capabilities of AI chatbots, specifically Auto-GPT, in performing tasks and getting things done. The author shares their experience using Auto-GPT to find the email address of the CEO of a startup called Lindy. Although Auto-GPT was not successful in finding the email address, it demonstrated the potential of AI chatbots to perform a wide range of tasks. The article also highlights the importance of ensuring the safety and reliability of AI agents as they take on more complex and risky tasks in the future. The CEO of Lindy believes that AI agents have the potential to replace certain office workers and transform various professions.

If we want a bulleted list, we can modify the prompt as shown below.


template = """You are an advanced AI assistant that summarizes online articles into bulleted lists.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format.
"""


prompt = template.format(article_title=article.title, article_text=article.text)

summary = chat([HumanMessage(content=prompt)])
print(summary.content)

The output of the code is shown below

- The author used an open source program called Auto-GPT to find the email address of the CEO of Lindy AI.
- Auto-GPT suggested a plan and attempted to find the email address through web searches and checking the CEO's LinkedIn profile.
- The program also tried guessing the email address based on commonly used formats and used email verification services to check its guesses.
- None of the attempts were successful, but the program saved the addresses for the author to try emailing them.
- The author eventually made their own guess and found the correct email address.
- The experience with Auto-GPT highlights the potential of AI assistants like ChatGPT to perform a wide range of tasks.
- However, there are concerns about the safety and reliability of AI agents when handling riskier tasks.
- The CEO of Lindy AI believes that AI agents could replace certain office workers and lead to the disappearance of some professions.

To obtain a summary in French, we can guide the model to produce it in the French language. However, keep in mind that GPT-3's primary training data is in English. Although it possesses multilingual abilities, the output's accuracy might be inconsistent for non-English languages. Here's a way to adjust the prompt.


template = """You are an advanced AI assistant that summarizes online articles into bulleted lists in French.

Here's the article you need to summarize.

==================
Title: {article_title}

{article_text}
==================

Now, provide a summarized version of the article in a bulleted list format, in French.
"""

prompt = template.format(article_title=article.title, article_text=article.text)

summary = chat([HumanMessage(content=prompt)])
print(summary.content)

The output of the code is shown below

- Auto-GPT est un programme open source qui peut aider à trouver des informations en ligne, comme l'adresse e-mail du PDG d'une startup appelée Lindy AI.
- Auto-GPT effectue une recherche sur le web pour trouver l'adresse e-mail du PDG de Lindy AI, mais ne parvient pas à la trouver.
- Le programme suggère ensuite de deviner l'adresse e-mail en se basant sur des formats couramment utilisés.
- Auto-GPT utilise différents services de vérification d'adresses e-mail pour vérifier ses suppositions, mais aucune ne s'avère valide.
- Auto-GPT enregistre les adresses dans un fichier sur l'ordinateur de l'utilisateur et suggère d'essayer de les contacter par e-mail.
- L'article souligne que les chatbots comme ChatGPT peuvent accomplir une grande variété de tâches sophistiquées grâce à leur capacité à répondre à de nombreuses questions.
- Cependant, il est important de développer des agents intelligents qui soient également sûrs pour éviter les problèmes potentiels.
- Le PDG de Lindy AI pense que les agents d'intelligence artificielle pourraient remplacer certains employés de bureau à l'avenir et prédit la disparition de certaines professions.

Behind the scenes, the code first gathers article details like the title and content. A conversational prompt is then crafted, positioning the AI as a sophisticated assistant tasked with summarizing the article in French bullet points. The GPT-3 model is loaded with specific settings to regulate output randomness and the prompt is populated with the article's data. The core part of the process is when we pass the formatted prompt to the model. The model parses the prompt, understands the task and generates a summary accordingly.

Conclusion

To wrap up, we've demystified the journey of crafting a proficient News Article Summarizer through the synergy of LangChain and GPT-3.5. This tool, enhanced by its ability to present AI summaries in bullet points not only distills complex articles for easy consumption but also embraces a global audience by offering translations with French as an example. Besides, the step-by-step guide provided serves as a beacon for those aiming to optimize their news-reading experience ensuring they remain updated without wasting time reading long news articles online.


If you want to contribute or you find any errors in this article please do leave me a comment.

You can reach out to me on any of the matrix decentralized servers. My element messenger ID is @maximilien:matrix.org

If you are in one of the mastodon decentralized servers, here is my ID @

If you are on linkedIn, you can reach me here

If you want to contact me via email maximilien@maxtekai.tech

If you want to hire me to work on machine learning, data science, IoT and AI-related projects, please reach out to me here

Warm regards,

Maximilien.

Did you find this article valuable?

Support Maximilien by becoming a sponsor. Any amount is appreciated!