Docs/Data & Integration/Communications

Communications

Automated email ingestion pipeline with AI spam filtering, attachment processing, and agent-driven document handling

10 min read

Overview#

Artifi's communications system provides a complete email ingestion and processing pipeline. Inbound emails are automatically classified, their attachments are extracted and analyzed, and AI agents process the contents -- turning vendor invoices into posted transactions, extracting receipt data, and more.

The system is designed around a zero-friction, fully automated flow:

Email received at entity address
      |
Spam classification (AI-powered)
      |
Attachments stored securely
      |
Agent processes content automatically
      |
Transaction posted via workflow
      |
Visible in Admin Dashboard Inbox

Email Address Format#

Each legal entity gets dedicated email addresses for different purposes:

{purpose}@{entity_code}.{org_code}.mail.arfiti.com

Examples:

Address	Purpose	Entity
`invoices@warehouse.acme.mail.arfiti.com`	Vendor bills	Warehouse Ops
`receipts@hq.acme.mail.arfiti.com`	Expense receipts	Headquarters
`statements@main.acme.mail.arfiti.com`	Bank statements	Main entity

Each address maps to a specific legal entity and a default processing agent, determining how incoming emails are handled automatically.

Inbound Email Pipeline#

Processing Flow#

Email received -- The mail provider receives the email and sends a webhook notification
Signature verification -- Webhook authenticity is verified using cryptographic signatures
Address resolution -- The recipient address is parsed to determine the organization, entity, and purpose
Spam classification -- AI classifies the email as legitimate, spam, or uncertain
Attachment storage -- Attachments are downloaded and stored in secure cloud storage
Agent event creation -- A processing event is created for the appropriate agent
Agent processing -- The agent analyzes attachments, resolves entities, and posts transactions

Spam Classification#

Every inbound email is classified using AI before agent processing:

Classification	Confidence	Action
Legitimate	High (>= 70%)	Accept and create processing event
Spam	Very high (>= 85%)	Reject and store audit record
Uncertain	Moderate (70-85%)	Accept but flag for review

Fail-Open Design#

The spam filter is designed to never lose legitimate business documents. If the classification system encounters any error (timeout, API issue, budget exceeded), the email is accepted and processed normally. This is a deliberate safety choice -- a false negative (spam getting through) is far less costly than losing a real invoice.

Configurable Settings#

Setting	Default	Description
Monthly budget	$10.00	Maximum classification spend per month
Spam threshold	0.85	Minimum confidence to reject as spam
Legitimate threshold	0.70	Minimum confidence to mark as legitimate
Fail open	Enabled	Accept emails on classification errors

Settings are configurable per organization, and the budget resets automatically each month.

Attachment Processing#

Storage#

When an email contains attachments, they are:

Downloaded from the email provider
Uploaded to secure cloud storage (Cloudflare R2)
Metadata recorded (filename, content type, size, content hash)

If cloud upload fails, the system falls back to storing the provider's reference URL.

Content Extraction#

Agents extract content from attachments on-demand using AI:

Content Type	Method	Cost	Features
PDF	Document API	~$0.02/page	Text extraction, OCR for scanned documents
Images (JPEG, PNG)	Vision API	~$0.01/image	Receipt photos, scanned documents
Text files (TXT, CSV, HTML)	Direct read	Free	No AI needed

Extraction Output#

When an agent reads an attachment, it receives:

Full extracted text (with English translation if the original is non-English)
Detected language
Extracted amounts (total, subtotal, tax, currency)
Structured metadata (vendor name, invoice number, dates)
Line items with descriptions, quantities, and amounts

Cost Efficiency#

Content extraction happens only when an agent decides to read an attachment. Spam emails incur zero extraction cost because the agent never processes them. For a typical month with 100 emails (50% spam), extraction costs are approximately $1-2.

Agent Integration#

Bill Processor Agent#

The primary consumer of inbound emails is the bill processing agent. When it receives an email event:

Parses the email metadata (sender, subject, attachments)
For each attachment:
- Extracts content using AI
- Resolves the vendor (from memory, database, or creates inline)
- Loads reference data (accounts, tax codes, payment terms)
- Classifies line items (expense, prepaid, or asset)
- Posts the transaction through the workflow system
- Links the source attachment to the transaction
- Checks for items that should be capitalized as fixed assets

Processing cost per bill: Approximately $0.003-0.005 (using efficient AI models)

Other Agent Types#

The email system supports multiple agent types for different purposes:

Bank Statement Processor -- Extract transactions from statement PDFs
Receipt Processor -- Extract expense data from receipt photos
AR Invoice Processor -- Process incoming customer documents

Admin Dashboard Inbox#

Inbox View#

The communications inbox provides a comprehensive view of all email activity:

Stats bar: Unprocessed, processed, spam blocked, and total counts (30 days)
Filters: Direction (inbound/outbound), status (pending/processing/completed/failed/archived), classification (spam/legitimate/uncertain)
Search: Across subject, sender, and recipient addresses
Pagination: 50 items per page

Each row shows the subject, sender/recipient, classification badge with confidence score, processing status, attachment count, and date.

Email Detail View#

Clicking an email opens a detailed view with:

Metadata panel: Sender, recipients, CC, received date, processed date
Classification details: Result, confidence score, reasoning
Processing status: Current status and any error messages
Agent execution: Agent type, instance ID, number of attempts
Body viewer: Tabs for text and HTML views
Attachment list: Files with storage location and preview capability
PDF/Image preview: Collapsible preview panel for quick review
Created documents: Bills and invoices linked to this email
Actions: Mark as Real (override spam), Retry Processing, Archive

Key Actions#

Action	Purpose
Mark as Real	Override a spam classification to legitimate and trigger processing
Retry Processing	Re-queue a failed email for agent processing
Archive	Remove an email from the active inbox

Outbound Email#

The system also supports sending emails for:

Agent notification emails (anomaly alerts, processing summaries)
Team invitation emails
System alerts

Outbound emails are sent through the Resend API with proper authentication (SPF, DKIM, DMARC).

Entity Email Management#

Each legal entity can have multiple email addresses for different purposes:

Property	Description
Email address	Full address (unique among active addresses)
Email type	Purpose: bills, receipts, statements, etc.
Inbound/Outbound	Whether the address accepts and/or sends emails
Default agent	Which agent processes emails at this address
Routing rules	Configuration for auto-processing and approval requirements

When an email arrives, the system looks up the routing configuration to determine the entity, email type, and processing agent.

Monthly Cost Example#

For an organization receiving 100 emails per month (50% spam):

Item	Cost
Spam classification (100 emails)	~$0.10
PDF extraction (50 invoices)	~$1.00-1.50
Bill processing AI calls	~$0.15-0.25
Total	~$1.25-1.85

Security and Compliance#

Email Security#

Webhook signatures verified cryptographically on every inbound email
SPF/DKIM validation handled by the email provider before delivery
Failed signature verification is rejected

Agent Security#

Agents submit through the same workflow system as human users
Same approval lanes apply (risk-based routing)
Agent permissions are scoped to specific workflows and tools
All agent actions are logged with full traceability

Data Privacy#

Area	Protection
Spam classification	Only headers and first 1,000 characters sent to AI
File storage	Private bucket, presigned URLs, organization-scoped, encrypted at rest
Email retention	Configurable retention period (default 90 days)
Credentials	Stored in environment variables, never in the database

← PreviousStorage Management

Subscribe to new posts

Get notified when we publish new insights on AI-native finance.