GPT-4 is the large multimodal language model. However, contrary to some reports, it does not support the ability of creating videos from text.
GPT-4 is able to accept text and image inputs and generate text output. GPT-4 can accept text input from a variety of domains, including documents with text, photographs, diagrams or screenshots. OpenAI explained this on its website.
However, this feature is currently in “research preview” so it won’t be made public.
OpenAI stated that GPT-4 is less capable than humans in real-world situations, but still performs well on academic and professional benchmarks.
It passed the simulated bar exam with a score of around the top 10%. GPT-3.5 scored in the top 10%.
Leaps Over Past Models
Casetext, a maker of CoCounsel, an AI legal assistant, was one of the first to use GPT-4. It claims that it is capable of passing both multiple-choice as well as the written sections of the Uniform Bar Exam.
Pablo Arredondo, cofounder and chief innovation officer at Casetext said that GT-4 surpasses the power of older language models in a statement. The model’s ability to not only generate text but also to interpret it is a sign of something new in law practice.
“Casetext’s CoCounsel is changing the way the law is practised by automating crucial, time-intensive tasks, and freeing up our lawyers to concentrate on the most important aspects of practice,” Frank Ryan (Americas Chair of DLA Piper), said in a press release.
OpenAI stated that it spent six months aligning GPT-4 based on lessons from its adversarial test program as well as ChatGPT. This resulted in the best-ever results, though not perfect, on factuality and steerability. It also refused to leave guardrails.
The GPT-4 training runs were also extremely stable, according to the company. It was the first large-scale model that was capable of accurately predicting its training performance ahead of time.
It stated that “as we continue to concentrate on reliable scaling,” they would “hone our method to help us anticipate and prepare for future capabilities more far in advance — something which we consider critical to safety.”
OpenAI pointed out that there could be subtle differences between GPT-3.5 & GPT-4. It explained that the difference is made when the task complexity reaches a certain threshold. GPT-4 is more creative and reliable than GPT-3.5 and can take on more complex instructions.
GPT-4 is also more customizable than its predecessor. OpenAI explained that instead of the traditional ChatGPT personality, which has a fixed tone, voice, and style, developers can now specify the style and task they want for their AI by using the “system message”. System messages enable API users to personalize their users’ experiences within certain limits.
However, API users will need to wait in order to test the feature. They will also be limited access to GPT-4 by a waitinglist.
OpenAI admitted that GPT-4 is not as capable as the older GPT models. It is still not 100% reliable. It “hallucinates” facts, and makes reasoning mistakes.
OpenAI advised that language model outputs should be used with great care, especially in high-stakes situations.
GPT-4 also has the potential to be incorrect in its predictions. It doesn’t take care to double-check work when it is likely to make mistakes.
Over the weekend, anticipation was high for the release of GPT after a German Microsoft executive suggested that a text to video capability be included in the final package.
Andreas Braun, chief technology officers for Microsoft Germany, stated that GPT-4 will be introduced next week.
Rob Enderle, president of the Enderle Group and principal analyst, said that text-to-video could be disruptive. He is also president and chief analyst at the Enderle Group advisory services firm in Bend.
TechNewsWorld’s He said that the technology could dramatically change how movies and TV programs are made, as well as how news programs are formatted. It also allows for user customization.
Enderle pointed out that the technology could be used to create storyboards using drafts of scripts. “This technology will mature and become closer to a final product as it matures.”
Greg Sterling, cofounder of Near Media (a news, commentary and analysis website), stated that text-to-video apps still create basic content.
TechNewsWorld’s he said that text-to-video could be disruptive because it will allow for more video content to be created at low, or even almost zero cost.
He said, “The quality and efficacy of that video is another matter.” “But I think some of it will still be decent.”
He said that explainers and basic information would be good candidates for text to video.
He said, “I can imagine that some agencies use it to create videos for SMBs to use in their sites or YouTube for ranking purposes.”
He said, “It won’t be good — at minimum at the beginning — at any brand content,” he added. “Social media content is another possible use. YouTube creators will use it to produce volume for views and ad revenues.
Deepfakes are not to be trusted
ChatGPT has shown us that technology such as text-to-video can pose dangers.
“The most dangerous uses cases, as with all tools like these, are the garden variety scams that impersonate people to relatives or attacks upon particularly vulnerable persons and institutions,” stated Will Duffield, a policy analysis specialist at the Cato Institute, which is a Washington, D.C. think-tank.
Duffield, however, dismissed the idea of text-to-video being used to create effective “deepfakes.”
He explained that well-resourced attacks such as the Russian deepfake Zelenskyy’s surrendering last year have failed because there is enough context and expectation to disprove it.
He said, “We have very clear notions about who public figures are, their motivations, and what we can expect of them to do.” “So, when media portrays them acting in an unusual way, or that isn’t consistent with these expectations, it’s likely that we will be very critical of it.”