maadaa AI News: Adobe Buys Videos at $3/min, TikTok Tests Virtual Influencers for eCommerce & Ads, Musk Releases Grok-1.5V MLLM & More

maadaa.ai
5 min readApr 17, 2024

(maadaa AI News Weekly: April 9~ April 15)

1. Adobe Purchases Videos At $3 Per Minute To Build AI Models

News:

Adobe is expanding its generative AI capabilities by acquiring a large dataset of short video clips depicting people engaged in various actions and emotions.

Key Points:

- Adobe incentivizes artists with $120 to submit over 100 short video clips featuring people, body parts, and object interactions.

- The aim is to amass a dataset for training Adobe’s AI models, intended for integration with creative software like Photoshop and Illustrator.

- This underscores the vast data requirement for advanced AI model development and the controversy over data sourcing.

- Adobe AI models train on their stock media library, excluding user-generated content from platforms like YouTube and Facebook.

Why It Matters?

It highlights the importance of data provenance and copyright in AI, mentioning Adobe’s proprietary video dataset initiative. By using legitimate sources for training materials, Adobe is promoting ethical AI development and avoiding legal issues.

2. TikTok Explores Virtual Influencers for Enhanced E-commerce and Advertising

News:

TikTok is developing an AI-powered feature to create virtual influencers for product promotion. However, it’s not yet available to marketers due to lower e-commerce sales compared to human influencers.

Key Points:

- TikTok is working on an AI feature to generate virtual influencers for product endorsements.

- These AI entities would interpret scripts from prompts given by advertisers or TikTok Shop vendors.

- Despite ongoing tests, the feature is not yet accessible to marketers due to its lower e-commerce performance compared to human influencers.

- Sponsorship revenue distribution between virtual and human influencers raises concerns about real creators’ opportunities.

Why It Matters?

TikTok is enhancing its advertising and e-commerce potential through AI with the introduction of virtual influencers. This could significantly affect its creator ecosystem and brand advertising strategies. TikTok needs to address potential concerns, especially given scrutiny over its China connections and a possible US ban.

3. Musk’s Multimodal Model Grok-1.5V Is Released

News:

Elon Musk’s AI firm, xAI, introduces its first multimodal model, Grok-1.5 Vision (Grok-1.5V). This model can interpret a broad range of visual data, such as documents, diagrams, charts, screenshots, and photos.

Key Points:

- Grok-1.5V can convert a whiteboard sketch to Python code, create a bedtime story from a child’s drawing, and transform a table into a CSV file.

- The model rivals top multimodal models like GPT-4 in tasks necessitating multi-disciplinary reasoning and understanding of spatial relationships.

- Grok-1.5V has shown success on the RealWorldQA benchmark, testing an AI’s comprehension and reasoning about the physical world.

- xAI aims to boost Grok-1.5V’s capabilities by broadening its ability to process diverse modalities such as images, audio, and video.

Why It Matters?

The development of Grok-1.5V represents a significant advancement in AI, emphasizing the value of multimodal datasets that include images, text, and audio. This enhances AI’s capability for more natural interaction with the world by processing diverse visual data, which is crucial for complex tasks like document analysis and visual reasoning. Additionally, it promotes innovation in creative fields by leveraging various data types.

https://x.ai/blog/grok-1.5

* Additional News:

1. Humane’s AI Pin, a groundbreaking $699 wearable AI device, is now available, marking a significant milestone in personal AI technology.

2. Google has expanded Google Photos’ editing features, yet advanced AI tools like Magic Editor remain exclusive to Google One subscribers and Pixel owners, not free for all as once stated.

3. Meta has unveiled the latest version of its custom AI chip, the “next-gen” Meta Training and Inference Accelerator (MTIA), which is said to triple the AI performance compared to the previous generation.

4. Amazon has added computer scientist Andrew Ng to its board, enhancing its AI capabilities to better compete in generative AI.

5. Poe, Quora’s AI chatbot platform, now lets creators earn money per message sent to their bots, moving beyond earnings from premium subscriptions.

Open & Commercial AI Training Datasets

1. Kinetics Dataset

The Kinetics Dataset is a comprehensive collection of video clips aimed at action recognition, featuring up to 650,000 clips across 400/600/700 classes of human actions, including both human-object and human-human interactions. It offers three versions based on the number of action categories: Kinetics400, Kinetics600, and Kinetics700, with each category having a minimum of 400/600/700 clips. Each 10-second clip is annotated with a single action class, showcasing a wide range of real-world scenes and actions.

2. ActivityNet Captions

ActivityNet Captions contains 20k videos amounting to 849 video hours with 100k total descriptions, each with its unique start and end time. Each sentence has an average length of 13.48 words, which is also normally distributed. This data is often used for video-text retrieval or video-moment retrieval tasks. Fig 12 shows an example of ActivityNet caption datasets, this video is divided into five segments, and each segment has a descriptive sentence.

3. Fashion & E-commerce Open Dataset From maadaa.ai

The Fashion & e-Commerce Open Datasets are aimed at driving AI innovation in fashion and e-commerce. This dataset covers 24 scenarios, showcasing the diversity of real-world settings and includes 33 highly detailed sub-datasets. It’s designed for a variety of applications such as object detection, pose estimation, personalized recommendations, and more. This comprehensive and versatile dataset is expected to foster innovative solutions in the industry.

4. Multi-modal Generative AI Large Datasets — Licensed From maadaa.ai

maadaa.ai’s large dataset, Multi-modal Generative AI Large Datasets — Licensed, is specially developed for state-of-the-art multi-modal large language models, including various structured datasets like image-text pairs, video-text pairs, and e-book in markdown. Following the rules of international copyright authorization, this large dataset ensures the infusion of authenticity and diversity into Generative AI model training, propelling Generative AI models towards unprecedented accuracy and innovation.

--

--

maadaa.ai

maadaa.ai is committed to providing professional, agile and secure data products and services to the global AI industry.