What Happened in AI This Week: LLM Model Performance, GPT-4 Evaluations, and More

2 min readJul 27, 2023

In the world of AI this week, much attention has been drawn to the performance of LLM models, particularly GPT-4. A recent study conducted by students from Stanford and Berkeley has raised concerns about a potential decline in GPT-4’s performance over time. The study highlighted that tasks such as identifying prime numbers and solving coding questions have experienced a significant decrease in accuracy. However, it’s important to note that the authors’ intention was not to showcase the degradation of OpenAI APIs’ quality but rather to draw attention to the issue of instability and the possibility of application crashes due to underlying model response changes.

Thanks for reading SolanAI’s NewsLetters -Master AI, Master Life-! Subscribe for free to receive new posts and support my work.

OpenAI has responded to these claims and reassured users that they are actively taking steps to enhance API stability. Developers have been given the capability to specify and utilize a particular model version, granting them more control and predictability in their applications.

On a more positive note, another study comparing GPT-4 to medical students in clinical case exams found that GPT-4 outperformed first and second-year Stanford students. This achievement has sparked discussions about reevaluating the future of student evaluations.

In other exciting developments, Meta has released Llama-2, an open-source model available under a commercial license, demonstrating similar performance to ChatGPT. Additionally, LangChain has introduced LangSmith, a platform aimed at helping developers bridge the gap between prototype and production in LLM applications. Meanwhile, Apple is currently testing “Apple GPT,” an AI chatbot similar to ChatGPT. Furthermore, Cerebras Systems has signed a groundbreaking $100 million deal to develop AI supercomputers that could potentially challenge Nvidia’s dominant position in the market.

OpenAI is also making strides in improving the user experience by introducing personalized custom instructions for ChatGPT, allowing users to have a more tailored and adaptable interaction. These instructions will be gradually rolled out to all users.

For those seeking interesting content, there are resources available on LLaMa 2, discussions on hallucinations in AI, building an AI WebTV, the transition to AI-assisted programming, and techniques for ensuring consistency in AI-generated visuals. Happy reading and watching!

Thanks for reading SolanAI’s NewsLetters -Master AI, Master Life-! Subscribe for free to receive new posts and support my work.

What Happened in AI This Week: LLM Model Performance, GPT-4 Evaluations, and More

Written by Yuki

No responses yet