• 4 AI Things
  • Posts
  • ElevenLabs Brings Sound to OpenAI's Sora

ElevenLabs Brings Sound to OpenAI's Sora

AI Sound effects

Here are the 4 interesting AI things that I learned and enjoyed this week.

Understanding different types of documents like business forms and invoices is crucial. Traditional AI methods such as LLMs and GNNs often struggle with the layout and connections within these documents. More about GNNs in today’s AI concept segment. Researchers developed a system called DocGraphLM to tackle this issue. It combines understanding content and document structure to see how parts of a document are related in terms of position and connection. The AI treats documents like maps, making it better at pulling out information and answering questions about them. Imagine you have an invoice with items listed, their prices, and a total amount due. DocGraphLM can identify and understand the layout, like which numbers are prices and which text describes items. It then connects these pieces, figuring out item-price pairs.

AI Meme

In our last edition, we discussed Sora's ability to create realistic videos from text. Now, ElevenLabs has taken it a step further by adding sound effects that match the video scenes, enhancing the sample videos from OpenAI with background sounds. This advancement could significantly speed up video production, especially in post-production, where adding sound effects is time-consuming and costly. Imagine creating a video about a beach and having the sound of waves automatically added to it. This tool represents a big leap in making video creation more efficient and accessible. I personally believe this tool could definitely accelerate the delivery by 3x times.

In our 4th edition we tried to describe LLMs. Today, we will be talking about GNNs. Graph Neural Networks are a type of AI that helps computers understand and work with data in the form of graphs, much like how social networks. Imagine graphs as networks of points connected by lines, where points represent people and lines show the relationships between them. They are designed to analyze these connections, learning patterns and insights that are difficult to spot with traditional AI methods. For instance, in social networks, GNNs can identify communities or suggest friends by understanding the intricate web of relationships. This makes them good at tasks where the connections and relationships between data points are crucial for making decisions.

I'm sharing an effective way to write a prompt for any scenario in ChatGPT or Bard to get more accurate results. Its called the TCEP(Task, context, exemplar and persona) framework and it has the order of importance as described below.

  • Task always starts with an action verb like create, summarize, write etc.

  • Context should include answers to what’s the user’s background? What does success look like?

  • Exemplar is nothing but providing examples within the prompt to get accurate results

  • Persona answers whom do you want to impersonate

As an experienced newsletter curator use three-act structure framework to summarize top 3 takeaways from the research paper for AI research section of my newsletter

Sample prompt using TCEP Framework

⏭️ Stay curious, keep questioning.