Gmail Attachment Transcription Guide: For Audio And Video

Gmail Attachment Transcription Guide: For Audio And Video

A useful tool that automates the process of downloading email messages and file attachments from Gmail to your Google Drive is the Save Gmail to Google Drive add-on. You can easily save email messages as PDFs and save attachments in their original format with this handy add-on.

But there’s more! The Gmail add-on has improved things by adding support for transcribed audio and video files in Gmail messages in its most recent iteration. The Whisper API from OpenAI, which uses cutting-edge technology, enables the add-on to convert audio and video into text files that can be organized in your Google Drive.

Here are the straightforward procedures to convert audio and video attachments in Gmail to text:

Step 1: Install the Google Workspace marketplace add-on Save Gmail to Google Drive to get going. Once the add-on has been installed, open a new Google Sheet by going to, then select the Extension menu > Save Emails > Open App to use it.

Step 2: Establish a new workflow and provide the Gmail search parameters. The add-on will quickly search through the emails that match, looking for any audio and video assets that need to be transcribed.

You’ll be happy to know that MP3, WAV, MP4, MPEG, and WEBM are just a few of the audio and video formats that OpenAI’s speech-to-text API supports. Additionally, Gmail’s file size limitations will keep you well inside the 25 MB maximum file size limit for transcription.

Step 3: Choose the “Save Audio and Video Attachments as text” option on the following screen, then decide whether to save the transcript in text or PDF format.

Here’s a clever tip: use markers to customize the file name. For instance, the add-on will automatically replace these markers with the real sender’s email and the email topic if you set the file name as “Subject” and “Sender Email.”

You must also enter your OpenAI API key in order to use the OpenAI API’s potent transcription capabilities. Unquestionably, OpenAI bills a fair $0.006 per minute of voice or video transcription, rounded to the nearest second.

Once you’ve established your process, it will carefully transcribe incoming messages as they arrive in your inbox in the background. Simply consult the Google Sheet itself to keep track of the workflow’s development.

Managing your Gmail attachments has never been simpler thanks to the Save Gmail to Google Drive add-on and its amazing transcription feature powered by OpenAI. Take advantage of the convenience of seamless transcription at your fingertips to save time, keep organized, and enjoy!

Sure, here’s the rewritten content:

Want to easily transcribe audio and video files using the power of Google Apps Script and the OpenAI API? We have your back! You can copy and use the Google Script’s source code in your own projects by using the link provided below.

// Define the URL for the OpenAI audio transcription API
const WHISPER_API_URL = '';
// Define your OpenAI API key
const OPENAI_API_KEY = 'sk-putyourownkeyhere';

// Create a function that transcribes audio given its file ID and language
const transcribeAudio = (fileId, language) => {
  // Get the audio file as a blob using the Google Drive API
  const audioBlob = DriveApp.getFileById(fileId).getBlob();

  // Send a POST request to the OpenAI API with the audio file
  const response = UrlFetchApp.fetch(WHISPER_API_URL, {
    method: 'POST',
    headers: {
      Authorization: `Bearer ${OPENAI_API_KEY}`,
    payload: {
      model: 'whisper-1',
      file: audioBlob,
      response_format: 'text',
      language: language,

  // Retrieve the transcription from the API response and log it to the console
  const data = response.getContentText();

Just be sure to substitute your own distinctive OpenAI API key for OPENAI_API_KEY. Additionally, check that the audio or video file you want to translate is on your Google Drive and that you have at least view (read) access to it.

You’ll be able to effortlessly transcribe audio and video content utilizing this strong script. With precise and effective transcriptions, you may improve your projects and save time. The possibilities are unlimited!

Large Audio and Video Files Can Be Transcribed:

It’s crucial to remember that the Whisper API has a 25 MB audio file size restriction when working with it. But don’t worry! There is a clever option using the Pydub Python tool if you find yourself with a larger audio file. Just divide the audio into more digestible parts before sending it to the API for transcription.

Don’t worry if your video files are large. Using FFmpeg, you can quickly remove the audio track from the video, making it appropriate for transcription using the API.

How to get the audio out of a video is as follows:

ffmpeg -i video.mp4 -vn -ab 256 audio.mp3

FFmpeg will carefully extract the audio track from your video file and store it as audio.mp3 if you issue this command.

Additionally, the command to split those large audio files into manageable chunks is as follows:

ffmpeg -i large_audio.mp3 -f segment -segment_time 60 -c copy output_%03d.mp3

This wonderful bit of code divides your large_audio.mp3 file into many segments, each of which is titled output_001.mp3, output_002.mp3, and so forth, according to the length of the file.

You may overcome the Whisper API’s size restrictions and quickly and easily transcribe your audio and video information if you have these clever tricks at your disposal. With correct and effectively transcribed content, stay one step ahead of the competition and secure those first-page rankings!

Leave a Comment