Blog

Evaluating LLM text summarisation: Claude vs. Bard vs. GPT-4

Claude, Bard or GPT-4? Which LLM will be the best for summarising texts? In this article, we analyse them one by one and evaluate them. Find out!

LLMs in action: starting point

Language Learning Models (LLMs) have emerged as vital tools for summarising and understanding data, and they are increasingly being used to empower businesses in making informed decisions. In this blog post, we will evaluate the performance of three leading LLMs on summarising text: Claude, Bard, and GPT-4.

For this assessment, we have asked these three language models to summarise the same text. We have chosen an article for this that delves into the significance of digital transformation, the challenges SMBs (Small and Medium Businesses) often grapple with, the importance of leveraging data analytics expertise, and the tangible benefits of outsourcing. Interested in reading the complete article? You can do it here.

Hand using laptop computer with virtual screen and document for online approve paperless quality assurance and ERP management concept.

Claude, Bard and GPT-4: a bit of context

Bard is a conversational Artificial Intelligence tool created by Google. It is based on LaMDA, an experimental language model developed by Google with the purpose of enhancing dialogue applications.

Claude is a language model developed by Anthropic. Unlike other models, Claude stands out for its advanced ability to process information in natural language. Claude has also been integrated into applications like Bing.

GPT-4 is a language model created by OpenAI. It employs a neural network architecture that is trained on large datasets to understand and generate text in natural language.

How to compare LLM models? Evaluation criteria

To ensure a fair comparison, we set up a structured evaluation framework. The summaries were compared based on:

Completeness: Does the summary encapsulate all the key points from the article?
Clarity: Can readers easily grasp the summarised content? Is jargon avoided?
Conciseness: Is the summary concise, eliminating unnecessary fluff?
Structure: Does the summary follow a logical flow and organization?
Relevance: Does the summary keep its focus on the primary theme of the article?

Using these parameters as a reference, we will evaluate the strengths and differences of each LLM.

Testing LLM text summarisations

Having discussed the evaluation criteria and the significance of the original article, it is time to show the output of each LLM.
Below you can see the output provided by each language model. This result is the answer to our prompt, asking it to summarise and highlight the key points of the article about the importance of digital transformation.

Claude vs. Bard vs. GPT-4: the results

Claude: this LLM delivered an exemplary summary, closely aligning with the original content. It captured the essence of digital transformation, the unique challenges faced by SMBs, the importance of data analytics expertise, and the undeniable benefits of outsourcing. It was both comprehensive and highly relevant. One of Claude’s key characteristics is its ability to adapt and its more creative storytelling.

Bard: it presented a clear and concise overview, accentuating the strategic significance of digital transformation and the inherent value of outsourcing. The main difference of Bard’s result is its efficiency in searching for information and its integration with the Google search engine. For this reason, it adds a section with additional points that are not shown in the text, which is very interesting.

GPT-4: finally, GPT-4 brought a distinctive viewpoint to the table, encapsulating several crucial points. This model is characterised by capturing the user’s intention, tone and need and providing a response based on their needs. In addition, it is trained with the data provided to it. However, it should be considered that it is limited because it is updated until the year 2021. However, this may not be negatively affected if prompts are written accordingly.

These three large language models become more effective and accurate as they are trained. The more information provided, the better the result.

Which LLM is best for summarising text? Conclusions

In the current language technology scenario, three powerful names stand out: Claude, Bard and GPT-4. As we have seen throughout this article, these LLMs are considered as the main options for the task of summarising texts with unparalleled power.

However, the real essence lies in choosing the LLM that best suits your specific needs. Experimentation and direct comparison between these options is vital to understand the strengths and unique particularities of each one.

Choosing the right LLM can make all the difference to the effectiveness and impact of the summaries generated. Therefore, having an expert tech partner’s support to advise you professionally is essential in this process.

Other advantages of LLMs for business

Using AI language models can be very valuable for companies that generate large volumes of data. You can read more in this article about how you can elevate your client experiences with LLM-driven services.

Some key benefits of these models can be:

High human language understanding to streamline tasks.

Improved content generation, due to their generative capabilities.

Increased efficiency, as large language models can automate tasks and use the time gained to generate value for the business.

Do you want to discover all the potential that LLM can bring to your company?