This week I want to post something a little more technical, I hope people will appreciate it. For those who are technically inclined, this will show a simple python script that can be run to summarise all new unread e-mails from your GMail account. Hopefuly we will see this feature added to GMail soon (using Bard instead of openAI ChatGPT - most likely).
OpenAI ChatGPT can be used for many tasks, and one of them is to summarise text. For example, it can easily create a bullet point list with all major topics of an article, a post or an e-mail.
I am going to show in this post how to use it, via API and a python script, to create a tool that will read all your new unread e-mails, summarise them in a few bullet points, and send you a summary of all them in a new e-mail message.
I am heavily relying on a post by Trent Leslie at here, but I will provide explicit python code that anyone can run. However, that post can be used as a reference.
There are a couple of tasks left to the reader:
Get an OpenAI API key
Get a GoogleCloud API key
The OpenAI key needs to be saved to a file in the same directory as the python file (alternatively just enter the full path to the file) as simple text, so it can be read when the script runs (safer than adding the key directly into the code). Similarly, you will need to create a credentials.json file from Google to read your e-mails.
Please, note that these steps are not free (setting up the accounts is, but not using their features) and any time you will be running this tool you will incur a small expense, but it should just be a few pennies (unless you have several thousand unread e-mails).
It is pretty easy to get your OpenAI API key, simply:
Go to openai.com
Create an account
On platform.openai.com, under Personal→view API keys, select create new secret key
Cut and paste the new secret API key text and save it in a plain text file (e.g. openai.key)
Similarly, to read your GMail through an API, you need to create an account at console.cloud.google.com:
Go to console.cloud.google.com and create an account
Create a new project
Enable the GMail API
Go to API & Services→OAuth Consent Screen and create the application
Go to API & Services→Credentials and select OAuth client ID and Desktop app
Download the json credentials file and save it as credentials.json
With these two files, the first time you run this program you will be asked to login into your GMail account and download the token.json file.
Something that can potentially make this tool very useful is that it will work for any language supported by OpenAI, not just English, without any modification.
Do you want to see an example of how your emails will be summarised? I am subscribed to the “French Institute Alliance Francaise” newsletter and here are the original and the summary for one of their e-mails.
From: "French Institute Alliance Française (FIAF)" <fiafnews@fiaf.org>
Subject: This Thu & Fri Only! Bombyx Mori, Part of Dance Reflections by Van Cleef & Arpels Festival
Timestamp: Tue, 31 Oct 2023 17:34:31 -0000
Link: https://mail.google.com/mail/u/0/#inbox/18b86cd1db931xxx
Summary:
- US premiere of Ola Maciejewska's dance work, Bombyx Mori
- Performances on Thu, Nov 2 & Fri, Nov 3 at 7:30pm
- Post-show talk on Fri, Nov 3; Tickets are selling fast.
The python code is as follows. This code will only summarise new unread e-mails from the past day (num_days = 1), but it can be changed in the code to any number of days.
Have fun!
# -*- coding: utf-8 -*-
"""
@author: vzocca
"""
import openai
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
import re
from bs4 import BeautifulSoup
import os
import base64
from datetime import datetime, timedelta
# tutorial and explanation of code can be found at
# https://medium.com/@trentleslie/the-ultimate-guide-to-creating-a-simple-email-summarizer-with-gmail-api-and-openai-chatgpt-api-37cedf847b23
token_file = "token.json"
credentials_file = "credentials.json"
openai_api_key_file = "openai.key"
email_address = 'someone@gmail.com'
def init_google_cloud_service():
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly', 'https://www.googleapis.com/auth/gmail.send', 'https://www.googleapis.com/auth/gmail.modify']
creds = None
if os.path.exists('token.json'):
creds = Credentials.from_authorized_user_file(token_file, SCOPES)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
creds.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file(credentials_file, SCOPES)
creds = flow.run_local_server(port=0)
with open(token_file, 'w') as token:
token.write(creds.to_json())
service = build('gmail', 'v1', credentials=creds)
return service
def read_chatgpt_key():
keyfile = openai_api_key_file
with open(keyfile, 'r') as f:
key = f.read().strip()
return key
def time_period(days):
# Calculate the timestamp for 24 hours ago in RFC3339 format
time_period = datetime.utcnow() - timedelta(days=days)
#time_period_str = time_period.strftime("%Y-%m-%dT%H:%M:%S.%fZ")
time_period_str = time_period.strftime("%Y-%m-%d")
return time_period_str
def get_unread_emails(service):
#This will only get unread e-mails from the last 1 day. It can be modified
num_days = 1
time_period_str = time_period(num_days)
print(time_period_str)
query = f"is:unread in:inbox after:{time_period_str}"
response = service.users().messages().list(userId='me', q=query).execute()
messages = []
if 'messages' in response:
messages.extend(response['messages'])
while 'nextPageToken' in response:
page_token = response['nextPageToken']
response = service.users().messages().list(userId='me', q=query, pageToken=page_token).execute()
if 'messages' in response:
messages.extend(response['messages'])
return messages
def get_email_data(service, message_id):
msg = service.users().messages().get(userId='me', id=message_id, format='full').execute()
payload = msg['payload']
headers = payload['headers']
email_data = {'id': message_id}
for header in headers:
name = header['name']
value = header['value']
if name == 'From':
email_data['from'] = value
if name == 'Date':
email_data['date'] = value
if name == 'Subject':
email_data['subject'] = value
if 'parts' in payload:
parts = payload['parts']
data = None
for part in parts:
if part['mimeType'] == 'text/plain':
try:
data = part['body']['data']
except:
data = ''
elif part['mimeType'] == 'text/html':
try:
data = part['body']['data']
except:
data = ''
if data is not None:
text = base64.urlsafe_b64decode(data.encode('UTF-8')).decode('UTF-8')
soup = BeautifulSoup(text, 'html.parser')
clean_text = soup.get_text()
clean_text = remove_hyperlinks(clean_text)
email_data['text'] = clean_text
else:
try:
data = payload['body']['data']
except:
data = ''
text = base64.urlsafe_b64decode(data.encode('UTF-8')).decode('UTF-8')
soup = BeautifulSoup(text, 'html.parser')
clean_text = soup.get_text()
clean_text = remove_hyperlinks(clean_text)
email_data['text'] = clean_text
else:
data = payload['body']['data']
text = base64.urlsafe_b64decode(data.encode('UTF-8')).decode('UTF-8')
soup = BeautifulSoup(text, 'html.parser')
email_data['text'] = soup.get_text()
return email_data
def remove_hyperlinks(text):
# Remove URLs starting with http/https
text = re.sub(r'http\S+', '', text)
# Remove URLs containing '.com'
text = re.sub(r'\S+\.com\S*', '', text)
text = re.sub(r'\S+\.net\S*', '', text)
text = re.sub(r'\S+\.org\S*', '', text)
return text
def chunk_text(text, max_chars):
paragraphs = re.split(r'\n+', text)
chunks = []
current_chunk = ''
for paragraph in paragraphs:
if len(current_chunk) + len(paragraph) <= max_chars:
current_chunk += paragraph + '\n'
else:
chunks.append(current_chunk.strip())
current_chunk = paragraph + '\n'
if current_chunk:
chunks.append(current_chunk.strip())
return chunks
def create_email(sender, to, subject, body):
message = MIMEMultipart()
message['To'] = to
message['From'] = sender
message['Subject'] = subject
message.attach(MIMEText(body, 'plain'))
raw_message = base64.urlsafe_b64encode(message.as_bytes()).decode()
return {'raw': raw_message}
def send_email(service, email):
try:
sent_message = service.users().messages().send(userId='me', body=email).execute()
print('Successfully sent message.')
except Exception as error:
print(f'Error: {error} ')
sent_message = None
return sent_message
def mark_as_read_and_archive(service, message_id):
service.users().messages().modify(
userId='me',
id=message_id,
body={'removeLabelIds': ['UNREAD', 'INBOX']}
).execute()
def delete_message(service, message_id):
service.users().messages().modify(
userId='me',
id=message_id,
body={'removeLabelIds': ['INBOX']}
).execute()
def email_summariser(text):
summary = ''
#Make sure OpenAI model can read and perform the operation
chunks = chunk_text(text, 3000)
for c in range(len(chunks)):
system_prompt = '''You are a language model that summarizes emails in 1-3 bullet points.'''
user_input = f'''Summarize the following in less than 60 words and only using 1-3 bullet points: {chunks[c]}'''
style = "Powerpoint slide with 1-3 bullet points"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": style},
{"role": "assistant", "content": "On what topic?"},
{"role": "user", "content": user_input}
]
max_retries = 3
retry_count = 0
retry_flag = True
while retry_flag and retry_count < max_retries:
try:
# Call the ChatCompletion endpoint
response = openai.ChatCompletion.create(
model = "gpt-3.5-turbo",
messages=messages,
temperature = 1,
top_p = 1,
presence_penalty = 0.5,
frequency_penalty = 0.4
)
email_summary_chunk = (response['choices'][0]['message']['content'])
summary = summary + email_summary_chunk
retry_flag = False
except Exception:
retry_count += 1
return summary
if __name__ == "__main__":
service = init_google_cloud_service()
openai.api_key = read_chatgpt_key()
unread_emails = get_unread_emails(service)
# initialise the string that will contain the email summaries
email_summaries = ''
for m in range(len(unread_emails)):
message = unread_emails[m]
email_data = get_email_data(service, message['id'])
email_data['text'] = remove_hyperlinks(email_data['text'])
summary = email_summariser(email_data['text'])
email_summaries += f"From: {email_data['from']}\n"
email_summaries += f"Subject: {email_data['subject']}\n"
email_summaries += f"Timestamp: {email_data['date']}\n"
email_summaries += f"Link: https://mail.google.com/mail/u/0/#inbox/{message['id']}\n"
email_summaries += f"Summary:\n{summary}\n\n\n"
composed_email = create_email(email_address, email_address, "Email Summaries", email_summaries)
send_email(service, composed_email)