Integrating Image-To-Text And Text-To-Speech Models (Part 2)
In Part 1 of this brief two-part series, we developed an application that turns images into audio descriptions using vision-language and text-to-speech models. We combined an image-to-text that analyses and understands images, generating description, with a text-to-speech model to create an audio description, helping people with sight challenges. We also discussed how to choose the right model to fit your needs.
Now, we are taking things a step further. Instead of just providing audio descriptions, we are building that can have interactive conversations about images or videos. This is known as Conversational AI — a technology that lets users talk to systems much like chatbots, virtual assistants, or agents.
While the first iteration of the app was great, the output still lacked some details. For example, if you upload an image of a dog, the description might be something like “a dog sitting on a rock in front of a pool,” and the app might produce something close but miss additional details such as the dog’s breed, the time of the day, or location.
The aim here is simply to build a more advanced version of the previously built app so that it not only describes images but also provides more in-depth information and engages users in meaningful conversations about them.
We’ll use LLaVA, a model that combines understanding images and conversational capabilities. After building our tool, we’ll explore multimodal models that can handle images, videos, text, audio, and more, all at once to give you even more options and easiness for your applications.
Visual Instruction Tuning and LLaVAWe are going to look at visual instruction tuning and the multimodal capabilities of LLaVA. We’ll first explore how visual instruction tuning can enhance the large language models to understand and follow instructions that include visual information. After that, we’ll dive into LLaVA, which brings its own set of tools for image and video processing.
Visual Instruction Tuning
Visual instruction tuning is a technique that helps large language models (LLMs) understand and follow instructions based on visual inputs. This approach connects language and vision, enabling AI systems to understand and respond to human instructions that involve both text and images. For example, Visual IT enables a model to describe an image or answer questions about a scene in a photograph. This fine-tuning method makes the model more capable of handling these complex interactions effectively.
There’s a new training approach called LLaVAR that has been developed, and you can think of it as a tool for handling tasks related to PDFs, invoices, and text-heavy images. It’s pretty exciting, but we won’t dive into that since it is outside the scope of the app we’re making.
Examples of Visual Instruction Tuning Datasets
To build good models, you need good data — rubbish in, rubbish out. So, here are two datasets that you might want to use to train or evaluate your multimodal models. Of course, you can always add your own datasets to the two I’m going to mention.
- Instruction datasets: English;
- Multi-task: Datasets containing multiple tasks;
- Mixed dataset: Contains both human and machine-generated data.
Vision-CAIR provides a high-quality, well-aligned image-text dataset created using conversations between two bots. This dataset was initially introduced in a paper titled “MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models,” and it provides more detailed image descriptions and can be used with predefined instruction templates for image-instruction-answer fine-tuning.
There are more multimodal datasets out there, but these two should help you get started if you want to fine-tune your model.
Let’s Take a Closer Look At LLaVALLaVA (which stands for Large Language and Vision Assistant) is a groundbreaking multimodal model developed by researchers from the University of Wisconsin, Microsoft Research, and Columbia University. The researchers aimed to create a powerful, open-source model that could compete with the best in the field, just like GPT-4, Claude 3, or Gemini, to name a few. For developers like you and me, its open nature is a huge benefit, allowing for easy fine-tuning and integration.
One of LLaVA’s standout features is its ability to understand and respond to complex visual information, even with unfamiliar images and instructions. This is exactly what we need for our tool, as it goes beyond simple image descriptions to engage in meaningful conversations about the content.
Architecture
LLaVA’s strength lies in its smart use of existing models. Instead of starting from scratch, the researchers used two key models:
- CLIP VIT-L/14
This is an advanced version of the CLIP (Contrastive Language–Image Pre-training) model developed by OpenAI. CLIP learns visual concepts from natural language descriptions. It can handle any visual classification task by simply being given the names of the visual categories, similar to the “zero-shot” capabilities of GPT-2 and GPT-3. - Vicuna
This is an open-source chatbot trained by fine-tuning LLaMA on 70,000 user-shared conversations collected from ShareGPT. Training Vicuna-13B costs around $300, and it performs exceptionally well, even when compared to other models like Alpaca.
These components make LLaVA highly effective by combining state-of-the-art visual and language understanding capabilities into a single powerful model, perfectly suited for applications requiring both visual and conversational AI.
Training
LLaVA’s training process involves two important stages, which together enhance its ability to understand user instructions, interpret visual and language content, and provide accurate responses. Let’s detail what happens in these two stages:
- Pre-training for Feature Alignment
LLaVA ensures that its visual and language features are aligned. The goal here is to update the projection matrix, which acts as a bridge between the CLIP visual encoder and the Vicuna language model. This is done using a subset of the CC3M dataset, allowing the model to map input images and text to the same space. This step ensures that the language model can effectively understand the context from both visual and textual inputs. - End-to-End Fine-Tuning
The entire model undergoes fine-tuning. While the visual encoder’s weights remain fixed, the projection layer and the language model are adjusted.
The second stage is tailored to specific application scenarios:
- Instructions-Based Fine-Tuning
For general applications, the model is fine-tuned on a dataset designed for following instructions that involve both visual and textual inputs, making the model versatile for everyday tasks. - Scientific reasoning
For more specialized applications, particularly in science, the model is fine-tuned on data that requires complex reasoning, helping the model excel at answering detailed scientific questions.
Now that we’re keen on what LLaVA is and the role it plays in our applications, let’s turn our attention to the next component we need for our work, Whisper.
Using Whisper For Text-To-SpeechIn this chapter, we’ll check out Whisper, a great model for turning text into speech. Whisper is accurate and easy to use, making it perfect for adding natural-sounding voice responses to our app. We’ve used Whisper in a different article, but here, we’re going to use a new version — large v3. This updated version of the model offers even better performance and speed.
Whisper large-v3
Whisper was developed by OpenAI, which is the same folks behind ChatGPT. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. The original Whisper was trained on 680,000 hours of labeled data.
Now, what’s different with Whisper large-v3 compared to other models? In my experience, it comes down to the following:
- Better inputs
Whisper large-v3 uses 128 Mel frequency bins instead of 80. Think of Mel frequency bins as a way to break down audio into manageable chunks for the model to process. More bins mean finer detail, which helps the model better understand the audio. - More training
This specific Whisper version was trained on 1 million hours of weakly labeled audio and 4 million hours of pseudo-labeled audio that was collected from Whisper large-v2. From there, the model was trained for 2.0 epochs over this mix.
Whisper models come in different sizes, from tiny to large. Here’s a table comparing the differences and similarities:
Size | Parameters | English-only | Multilingual |
---|---|---|---|
tiny | 39 M | ✅ | ✅ |
base | 74 M | ✅ | ✅ |
small | 244 M | ✅ | ✅ |
medium | 769 M | ✅ | ✅ |
large | 1550 M | ❌ | ✅ |
large-v2 | 1550 M | ❌ | ✅ |
large-v3 | 1550 M | ❌ | ✅ |
Alright, so we’re going with LLaVA for image inputs, and this time, we’re adding video inputs, too. This means the app can handle both images and videos, making it more versatile.
We’re also keeping the speech feature so you can hear the assistant’s replies, which makes the interaction even more engaging. How cool is that?
For this, we’ll use Whisper. We’ll stick with the Gradio framework for the app’s visual layout and user interface. You can, of course, always swap in other models or frameworks — the main goal is to get a working prototype.
Installing and Importing the Libraries
We will start by installing and importing all the required libraries. This includes the transformers libraries for loading the LLaVA and Whisper models, bitsandbytes
for quantization, gtts
, and moviepy
to help in processing video files, including frame extraction.
#python
!pip install -q -U transformers==4.37.2
!pip install -q bitsandbytes==0.41.3 accelerate==0.25.0
!pip install -q git+https://github.com/openai/whisper.git
!pip install -q gradio
!pip install -q gTTS
!pip install -q moviepy
With these installed, we now need to import these libraries into our environment so we can use them. We’ll use colab
for that:
#python
import torch
from transformers import BitsAndBytesConfig, pipeline
import whisper
import gradio as gr
from gtts import gTTS
from PIL import Image
import re
import os
import datetime
import locale
import numpy as np
import nltk
import moviepy.editor as mp
nltk.download('punkt')
from nltk import sent_tokenize
# Set up locale
os.environ["LANG"] = "en_US.UTF-8"
os.environ["LC_ALL"] = "en_US.UTF-8"
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
Configuring Quantization and Loading the Models
Now, let’s set up a 4-bit quantization to make the LLaVA model more efficient in terms of performance and memory usage.
#python
# Configuration for quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16
)
# Load the image-to-text model
model_id = "llava-hf/llava-1.5-7b-hf"
pipe = pipeline("image-to-text",
model=model_id,
model_kwargs={"quantization_config": quantization_config})
# Load the whisper model
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
model = whisper.load_model("large-v3", device=DEVICE)
In this code, we’ve configured the quantization to four bits, which reduces memory usage and improves performance. Then, we load the LLaVA model with these settings. Finally, we load the whisper model, selecting the device based on GPU availability for better performance.
Note: We’re using llava-v1.5-7b as the model. Please feel free to explore other versions of the model. For Whisper, we’re loading the “large” size, but you can also switch to another size like “medium” or “small” for your experiments.
To get our assistant up and running, we need to implement five essential functions:
- Handling conversations,
- Converting images to text,
- Converting videos to text,
- Transcribing audio,
- Converting text to speech.
Once these are in place, we will create another function to tie all this together seamlessly. The following sections provide the code that defines each function.
Conversation History
We’ll start by setting up the conversation history and a function to log it:
#python
# Initialize conversation history
conversation_history = []
def writehistory(text):
"""Write history to a log file."""
tstamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
logfile = f'{tstamp}_log.txt'
with open(logfile, 'a', encoding='utf-8') as f:
f.write(text + '\n')
Image to Text
Next, we’ll create a function to convert images to text using LLaVA and iterative prompts.
#python
def img2txt(input_text, input_image):
"""Convert image to text using iterative prompts."""
try:
image = Image.open(input_image)
if isinstance(input_text, tuple):
input_text = input_text[0] # Take the first element if it's a tuple
writehistory(f"Input text: {input_text}")
prompt = "USER: <image>\n" + input_text + "\nASSISTANT:"
while True:
outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200})
if outputs and outputs[0]["generated_text"]:
match = re.search(r'ASSISTANT:\s*(.*)', outputs[0]["generated_text"])
reply = match.group(1) if match else "No response found."
conversation_history.append(("User", input_text))
conversation_history.append(("Assistant", reply))
prompt = "USER: " + reply + "\nASSISTANT:"
return reply # Only return the first response for now
else:
return "No response generated."
except Exception as e:
return str(e)
Video to Text
We’ll now create a function to convert videos to text by extracting frames and analyzing them.
#python
def vid2txt(input_text, input_video):
"""Convert video to text by extracting frames and analyzing."""
try:
video = mp.VideoFileClip(input_video)
frame = video.get_frame(1) # Get a frame from the video at the 1-second mark
image_path = "temp_frame.jpg"
mp.ImageClip(frame).save_frame(image_path)
return img2txt(input_text, image_path)
except Exception as e:
return str(e)
Audio Transcription
Let’s add a function to transcribe audio to text using Whisper.
#python
def transcribe(audio_path):
"""Transcribe audio to text using Whisper model."""
if not audio_path:
return ''
audio = whisper.load_audio(audio_path)
audio = whisper.pad_or_trim(audio)
mel = whisper.log_mel_spectrogram(audio).to(model.device)
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
return result.text
Text to Speech
Lastly, we create a function to convert text responses into speech.
#python
def text_to_speech(text, file_path):
"""Convert text to speech and save to file."""
language = 'en'
audioobj = gTTS(text=text, lang=language, slow=False)
audioobj.save(file_path)
return file_path
With all the necessary functions in place, we can create the main function that ties everything together:
#python
def chatbot_interface(audio_path, image_path, video_path, user_message):
"""Process user inputs and generate chatbot response."""
global conversation_history
# Handle audio input
if audio_path:
speech_to_text_output = transcribe(audio_path)
else:
speech_to_text_output = ""
# Determine the input message
input_message = user_message if user_message else speech_to_text_output
# Ensure input_message is a string
if isinstance(input_message, tuple):
input_message = input_message[0]
# Handle image or video input
if image_path:
chatgpt_output = img2txt(input_message, image_path)
elif video_path:
chatgpt_output = vid2txt(input_message, video_path)
else:
chatgpt_output = "No image or video provided."
# Add to conversation history
conversation_history.append(("User", input_message))
conversation_history.append(("Assistant", chatgpt_output))
# Generate audio response
processed_audio_path = text_to_speech(chatgpt_output, "Temp3.mp3")
return conversation_history, processed_audio_path
Using Gradio For The Interface
The final piece for us is to create the layout and user interface for the app. Again, we’re using Gradio to build that out for quick prototyping purposes.
#python
# Define Gradio interface
iface = gr.Interface(
fn=chatbot_interface,
inputs=[
gr.Audio(type="filepath", label="Record your message"),
gr.Image(type="filepath", label="Upload an image"),
gr.Video(label="Upload a video"),
gr.Textbox(lines=2, placeholder="Type your message here...", label="User message (if no audio)")
],
outputs=[
gr.Chatbot(label="Conversation"),
gr.Audio(label="Assistant's Voice Reply")
],
title="Interactive Visual and Voice Assistant",
description="Upload an image or video, record or type your question, and get detailed responses."
)
# Launch the Gradio app
iface.launch(debug=True)
Here, we want to let users record or upload their audio prompts, type their questions if they prefer, upload videos, and, of course, have a conversation block.
Here’s a preview of how the app will look and work:
Looking Beyond LLaVALLaVA is a great model, but there are even greater ones that don’t require a separate ASR model to build a similar app. These are called multimodal or “any-to-any” models. They are designed to process and integrate information from multiple modalities, such as text, images, audio, and video. Instead of just combining vision and text, these models can do it all: image-to-text, video-to-text, text-to-speech, speech-to-text, text-to-video, and image-to-audio, just to name a few. It makes everything simpler and less of a hassle.
Examples of Multimodal Models that Handle Images, Text, Audio, and More
Now that we know what multimodal models are, let’s check out some cool examples. You may want to integrate these into your next personal project.
CoDi
So, the first on our list is CoDi or Composable Diffusion. This model is pretty versatile, not sticking to any one type of input or output. It can take in text, images, audio, and video and turn them into different forms of media. Imagine it as a sort of AI that’s not tied down by specific tasks but can handle a mix of data types seamlessly.
CoDi was developed by researchers from the University of North Carolina and Microsoft Azure. It uses something called Composable Diffusion to sync different types of data, like aligning audio perfectly with the video, and it can generate outputs that weren’t even in the original training data, making it super flexible and innovative.
ImageBind
Now, let’s talk about ImageBind, a model from Meta. This model is like a multitasking genius, capable of binding together data from six different modalities all at once: images, video, audio, text, depth, and even thermal data.
Source: Meta AI. (Large preview)ImageBind doesn’t need explicit supervision to understand how these data types relate. It’s great for creating systems that use multiple types of data to enhance our understanding or create immersive experiences. For example, it could combine 3D sensor data with IMU data to design virtual worlds or enhance memory searches across different media types.
Gato
Gato is another fascinating model. It’s built to be a generalist agent that can handle a wide range of tasks using the same network. Whether it’s playing games, chatting, captioning images, or controlling a robot arm, Gato can do it all.
The key thing about Gato is its ability to switch between different types of tasks and outputs using the same model.
GPT-4o
The next on our list is GPT-4o; GPT-4o is a groundbreaking multimodal large language model (MLLM) developed by OpenAI. It can handle any mix of text, audio, image, and video inputs and give you text, audio, and image outputs. It’s super quick, responding to audio inputs in just 232ms to 320ms, almost like a real conversation.
There’s a smaller version of the model called GPT-4o Mini. Small models are becoming a trend, and this one shows that even small models can perform really well. Check out this evaluation to see how the small model stacks up against other large models.
ConclusionWe covered a lot in this article, from setting up LLaVA for handling both images and videos to incorporating Whisper large-v3 for top-notch speech recognition. We also explored the versatility of multimodal models like CoDi or GPT-4o, showcasing their potential to handle various data types and tasks. These models can make your app more robust and capable of handling a range of inputs and outputs seamlessly.
Which model are you planning to use for your next app? Let me know in the comments!
Chris’ Corner: Variations on What Not to Do
I think the nail is in coffin now: you should never design something for the web with only one (or even a narrow set) of particular viewport sizes in mind. It’s just so darn tempting to think that way. You have a couple of pretty specific screen sizes in front of you right now, you likely design toward those to some degree. Design tools often ask you to draw a rectangle that represent a screen to design for. Testing tools sometimes show you a site at a set of pre-set screen sizes. It can feel normal and fine to design toward, say, three sizes, and hone in on them. Honestly, that might end up working fine, but it might not! It might lead to some awkward in-betweens, especially if you are very rigid in writing CSS that only changes at those specific breakpoints only.
That’s the thing, really. You just don’t have to think in really specific breakpoints anymore. Media query width breakpoints are still a fine tool, but now we’ve got viewport units, container units, container queries, calc/min/max/clamp, and all sorts of other stuff that allow you to design components and pages that work well and look good at the size and under the conditions they are in. It’s just a better way to code. But this stuff has only relatively recently arrived in CSS so it’ll take a minute for it all to settle in.
This isn’t even really new news. Over a decade ago, I was like, yo, there are a ton of different sizes that your site is getting viewed at. Deal with it. Now we can properly.
Have websites gone to crap? Browse around popular sites, and I think you’ll land on an easy yes. Especially on mobile, cripes. Just to name a few: they are too slow to load, the ads and popups are too obtrusive, and there is too much usage of fixed-position elements that reduce usable area.
This website User Inyerface satirized it recently, and it’s pretty funny (ya know, if being intentionally frustrated is your thing, gamers should relate).
People have been worried about this for ages, and it never seems to get any better.
- Brad Frost called it bullshit.
- A couple folks made termsandconditions.game mining these patterns for ideas.
- Guangyi Li’s how-i-experience-web-today.com gets at the heart of the complete experience.
- Tracking is a big part of this story, and clickclickclick.click pokes at that.
This all just makes me sad. Fortunately, most things are fine.
Have you seen the popover API? It’s a neat idea, already play-with-able in Chrome. Think styled tooltips. The idea is that you connect some interaction (click of a button) to toggling another element with more information or context. Amazingly, to me, this HTML totally works in Chrome with no CSS or JavaScript at all:
<button popovertarget="my-popover">Open Popover</button>
<div id="my-popover" popover>
<p>I am a popover with more information.</p>
</div>
You can style stuff with CSS of course, but the basics of the interaction work without. Like a <details>
element.
Anytime we get any form of “state management” outside of JavaScript, the people will play! There are countless games made in CSS thanks to the whole idea of the :checked
selector in CSS and using the ~
combinator to select other elements.
This time, leave it to Garth Heyes who has made Tic-Tac-Toe entirely in HTML only. That’s gotta be a first.
Wanna see it? Fair warning first. It’s 170 MB (!!) of HTML and “over half a million nodes”. Chrome really struggles with this. It took my machine maybe near a minute to even render the first page, and each click took a while as well. If you’re down try it, see the demo.
So now that we’ve looked at something you absolutely shouldn’t do on the web, here’s Heather Buchel with some things you absolutely should do on the web. Heather ain’t even mad that we’re building websites with newfangled tech and trying to share code across platforms and all that, but, just, like, don’t break stuff. Don’t break super duper basic stuff that websites easily do and are good for everyone. I’ll hijack her whole list, but of course go read it for more context:
- Let me copy text so I can paste it.
- If something navigates like a link, let me do link things.
- Let me zoom in on my browser without the website getting all out of whack.
- Do responsive things.
- Let me have hover styles.
- If the UI completely changes when I click on something, as if I’ve navigated to a new page, give me a browser history update and a new URL.
- Let me see scroll bars.
- Stop hijacking my typical browser shortcuts for use in your own app.
Reasonable asks, no?
Onnnnneeee more thing you should be really careful about doing on the web. Adam Silver: The problem with sticky menus and what to do instead.
One problem is fairly obvious with sticky menus: they overlap stuff! They get in the dang way far too often.
But there are other things that cause problems that you might not see right away. Adam mentions zooming. One little zoom or too might kick a sticky/fixed element right off the page. Also, if something opens a sticky menu, and that menu happens to be taller than the viewport, you’ve got issues. You either need that area to be scrollable (but nested scrolling sucks) or you require users to scroll likely further than they want to just to see more of the menu. Ughghadk.
Adam lists three more that are just as bad or worse, and even less obvious at first glance. I’ll force you over there to see them. But I’ll snag the good ending, featuring the alternatives:
- Keep pages short: Sticky menus are a symptom of long pages so fix the root cause.
- Just let users scroll: It’s a myth that scrolling is a problem. Even on mobile, the top of the page is a flick or 2 away mostly.
- Put relevant links in context: For example, add a subscribe form to the end of a post or add a CTA to a pricing section.
- Use a back-to-top link: They’re relatively unobtrusive (but only do this once you exhaust the other options).
Send Time Optimization
Did you know that email Send Time Optimization (STO) can improve the open rate by up to 93%? Awesome! Or it might only be 10%. A slightly more credible case study claims that message delivery at the right time resulted in an open rate of 55%, a click rate of 30%, and a conversion rate of 13%. I’ll take that increase any day if there’s a positive ROI.
Optimization can be applied to any number of problems. It can be applied equally to content, where it may be to the customer’s benefit, as it can be applied to price, where optimization can deliver the maximum possible price for merchants.
How to get more traffic to a website??
Hi everyone!
I can not drive traffic on my website. The backlinks that I am creating are not indexing hence I am not getting any referral traffic as well as backlinks.
Can someone help me out? Also how can I index my backlink on Wikipedia?
Thanks!
Graphs and Language
A rising tide lifts all boats, and the recent advances in LLMs are no exception. In this blog post, we will explore how Knowledge Graphs can benefit from LLMs, and vice versa.
In particular, Knowledge Graphs can ground LLMs with facts using Graph RAG, which can be cheaper than Vector RAG. We'll look at a 10-line code example in LlamaIndex and see how easy it is to start. LLMs can help build automated KGs, which have been a bottleneck in the past. Graphs can provide your Domain Experts with an interface to supervise your AI systems.
Our First Look at the Upcoming Twenty Twenty-Five Default WordPress Theme
Free Blender Brushes: Enhance Your 3D Art with Essential Tools
If you’re a 3D artist, you know that Blender is one of the most powerful and versatile tools for creating stunning visual art. To truly unleash its potential, having a diverse set of free Blender brushes can make a world of difference. In this article, we’ve compiled a list of the best free Blender brushes that can help you elevate your art to the next level.
Why You Need Free Blender Brushes
Free Blender brushes are essential for any artist looking to add detail and texture to their models. Whether you’re working on character design, environment creation, or any other 3D project, the right brushes can save you time and enhance the quality of your work. By incorporating these tools into your workflow, you can achieve more realistic and intricate results with ease.
How to Download Blender Brushes and Install Them?
Installing new brushes in Blender is straightforward. Follow these steps to get started:
Download the Brush Pack : Ensure you download the brushes in a .zip or .blend file.
Open Blender : Start a new project or open an existing one.
Navigate to the Brush Settings : Go to the ‘Texture Paint’ or ‘Sculpt’ mode.
Import Brushes : In the ‘Brush’ panel, click on ‘Add Brush’ and navigate to your downloaded files.
Save User Settings : Save your preferences to keep the brushes available for future projects.
Enhancing your 3D models in Blender is easier than ever with these free Blender brushes. Whether you’re looking to add fine details, realistic textures, or intricate patterns, there’s a brush for every need. Download these brushes today and take your art to the next level!
See also
Sculpting Brushes for Blender
High Resolution Skin Brushes For Blender
ER Wood Brush
Rock Brushes for Blender
Organic Skin Brushes for Large Animals
Free Environment Sculpting Brushes
Rock Sculpt Brushes for Blender
Blender Grease Pencil Brush Pack
Basic Brushpack for Blender Texture Painting
Zbrush Orb Stylized Brushes Pack
Landscape Brushes for Blender
Sculpting Brushes
Blender 2.8 Brushpack
Blender Grease Pencil Crayon Brush
Blender Grease Pencil Brushes
The post Free Blender Brushes: Enhance Your 3D Art with Essential Tools appeared first on CSS Author.
Legit Customer Reviews of Talkliv
Building a Strong Brand Presence on Amazon
Uncovering Thousands of Unique Secrets in PyPI Packages
Let’s start with the big reveal of what we found:
- 3,938 total unique secrets across all projects
- 768 of those unique secrets were found to be valid
- 2,922 projects contained at least one unique secret
To put those numbers in perspective, there are over 450,000 projects released through the PyPI website, containing over 9.4 million files. There have been over 5 million released versions of these packages. If we add up all the secrets shared across all the releases, we found 56,866 occurrences of secrets, meaning once a secret enters a project, it is often included in multiple releases.
How to Do an LLC Name Search in 5 Simple Steps
Want to just get started? Click here to sign up for LegalZoom and do an LLC name search today. Forming …
How to Do an LLC Name Search in 5 Simple Steps Read More »
The post How to Do an LLC Name Search in 5 Simple Steps appeared first on .
Linking to an Image Folder Within a WordPress Theme
During WordPress theme development you will more then likely need to display some images that are located within your theme directory. The location of the images folder can vary greatly. It really comes down to how you like to set things up. Generally speaking if you were to have an images folder in the root of your theme you can … Read more
The post Linking to an Image Folder Within a WordPress Theme appeared first on Web Design Weekly.
Bard: The New ChatGPT Competitor
In its constant quest to optimize the user experience in artificial intelligence, Google has introduced Bard, its latest and most advanced conversational system.
This innovative tool not only promises to stay up-to-date thanks to its permanent connection to the Internet, distinguishing it from other systems such as ChatGPT, but it also seeks to revolutionize the way we interact with technology. From its ability to interpret and describe images to its promising integration with other leading services such as Gmail, Docs, and Google Lens, Bard is shaping up to be the central nexus in Google’s service ecosystem. Moreover, its collaboration with Adobe Firefly suggests a horizon where the generation and understanding of visual content reach unprecedented levels. Although still in an experimental phase, Bard promises to redefine the boundaries of what we expect from an AI system.
Automattic and Woo Help Maximize Agency Potential
How to Optimize Your Business For Local Search and Social Marketing
If you have a local business or one that targets audiences in a specific geo-location, then you will want to …
How to Optimize Your Business For Local Search and Social Marketing Read More »
The post How to Optimize Your Business For Local Search and Social Marketing appeared first on .
CSS Grid Tutorials for Beginners: Top Resources to Master Layouts
In the world of web development, mastering layout techniques is crucial. One of the most powerful tools at your disposal is CSS Grid. If you’re new to CSS Grid and looking for the best tutorials to get started, you’re in the right place. This guide will walk you through top resources that can help you understand and effectively use CSS Grid in your web projects.
Why Learn CSS Grid?
CSS Grid is a two-dimensional layout system for the web. It allows you to create complex layouts more easily and with cleaner code than older methods like floats and positioning. By learning CSS Grid, you can:
Create responsive layouts that adapt to different screen sizes.
Simplify the design process by reducing the need for external libraries.
Enhance your CSS skills and make your projects stand out.
Tips for Mastering CSS Grid
strong>Start Small : Begin with simple layouts and gradually move to more complex designs. This approach helps you build a solid understanding without feeling overwhelmed.
Use Browser Developer Tools : Most modern browsers have built-in tools that allow you to inspect and experiment with CSS Grid properties. These tools are invaluable for learning and debugging.
Practice Regularly : Like any other skill, mastering CSS Grid requires practice. Try to incorporate it into your projects, and don’t hesitate to experiment with different layouts.
Join Online Communities : Platforms like Stack Overflow, Reddit, and various developer forums can provide support and feedback as you learn. Engaging with a community can accelerate your learning process.
Learning CSS Grid is a valuable investment for any web developer. With these top tutorials and resources, you’ll be well on your way to mastering CSS Grid and creating stunning, responsive web layouts. Remember to practice regularly, leverage online communities, and keep exploring new techniques to refine your skills.
CSS Grid Tutorials and Resources
See also
CSS Grid Tutorials and Text Guides
A Comprehensive Guide to CSS Grid Design
CSS Grid Layout Tutorial
Exploring CSS Grid: A Beginner’s Guide
Exploring the Basics of CSS Grid Layout
A Step-by-Step Guide to Learning CSS Grid
CSS Grid Text Guide
The Best Guide to Understanding CSS Grid Layout
An Interactive Guide to CSS Grid Design
A Beginner’s Guide to Using the CSS Grid
Guide to CSS Grid Layout Module
How to Master CSS Grid in 5 Minutes ?
Mastering CSS Grid
CSS Grid Tutorial with Cheat Sheet
CSS Grid Game
Grid Garden
Video Tutorials
The Ultimate Guide to the CSS Grid Crash Course
How to Utilize CSS Grid the Easy Way
Learn CSS Grid in 20 Minutes
CSS Grid Course: A Step-by-Step Guide
How to Master the CSS Grid in 100 Seconds
Cheat Sheets
Visual Cheat sheet for CSS Grid Layout
Ultimate CSS Grid Cheat Sheet
CSS Grid Cheatsheet
CSS Grid Cheat Sheet
The post CSS Grid Tutorials for Beginners: Top Resources to Master Layouts appeared first on CSS Author.