Close Menu
    Trending
    • Redesigning Education to Thrive Amid Exponential Change
    • Advice From a First-Time Novelist
    • Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other
    • Cognitive Stretching in AI: How Specific Prompts Change Language Model Response Patterns | by Response Lab | Jun, 2025
    • Recogni and DataVolt Partner on Energy-Efficient AI Cloud Infrastructure
    • What I Learned From my First Major Crisis as a CEO
    • Vision Transformer on a Budget
    • Think You Know AI? Nexus Reveals What Everyone Should Really Know | by Thiruvarudselvam suthesan | Jun, 2025
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»An Unbiased Review of Snowflake’s Document AI
    Artificial Intelligence

    An Unbiased Review of Snowflake’s Document AI

    FinanceStarGateBy FinanceStarGateApril 16, 2025No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    As knowledge , we’re comfy with tabular knowledge…

    Tabular knowledge. Picture by Creator.

    We are able to additionally deal with phrases, json, xml feeds, and photos of cats. However what a couple of cardboard field filled with issues like this?

    (Picture by Annie Spratt, Unsplash)

    The data on this receipt needs so badly to be in a tabular database someplace. Wouldn’t it’s nice if we might scan all these, run them by an LLM, and save the leads to a desk?

    Fortunate for us, we stay within the period of Document Ai. Doc AI combines OCR with LLMs and permits us to construct a bridge between the paper world and the digital database world.

    All the key cloud distributors have some model of this…

    Right here I’ll share my ideas on Snowflake’s Doc AI. Except for utilizing Snowflake at work, I’ve no affiliation with Snowflake. They didn’t fee me to jot down this piece and I’m not a part of any ambassador program. All of that’s to say I can write an unbiased overview of Snowflake’s Document AI.


    What’s Doc AI? 

    Doc AI permits customers to rapidly extract info from digital paperwork. Once we say “paperwork” we imply photos with phrases. Don’t confuse this with niche NoSQL things.

    The product combines OCR and LLM fashions so {that a} consumer can create a set of prompts and execute these prompts in opposition to a big assortment of paperwork .

    Snowflake’s Doc AI on a (scrubbed) resume. Picture by writer.

    LLMs and OCR each have room for error. Snowflake solved this by (1) banging their heads in opposition to OCR till it’s sharp — I see you, Snowflake developer — and (2) letting me fine-tune my LLM. 

    Positive-tuning the Snowflake LLM feels much more like glamping than some rugged out of doors journey. I overview 20+ paperwork, hit the “practice mannequin” button, then rinse and repeat till efficiency is passable. Am I even a knowledge scientist anymore?

    As soon as the mannequin is educated, I can run my prompts on 1000 paperwork at a time. I like to save lots of the outcomes to a desk however you might do no matter you need with the outcomes actual time.


    Why does it matter? 

    This product is cool for a number of causes.

    • You possibly can construct a bridge between the paper and digital world. I by no means thought the massive field of paper invoices beneath my desk would make it into my cloud knowledge warehouse, however now it may possibly.  Scan the paper bill, add it to snowflake, run my Doc AI mannequin, and wham! I’ve my desired info parsed right into a tidy desk.
    • It’s frighteningly handy to invoke a machine-learning mannequin through SQL. Why didn’t we consider this sooner? In a outdated occasions this was a couple of hundred of traces of code to load the uncooked knowledge (SQL >> python/spark/and many others.), clear it, engineer options, practice/take a look at cut up, practice a mannequin, make predictions, after which typically write the predictions again into SQL. 
    • To construct this in-house can be a significant enterprise. Sure, OCR has been round a very long time however can nonetheless be finicky. Positive-tuning an LLM clearly hasn’t been round too lengthy, however is getting simpler by the week. To piece these collectively in a means that achieves excessive accuracy for a wide range of paperwork might take a very long time to hack by yourself. Months of months of polish.

    After all some parts are nonetheless in-built home. As soon as I extract info from the doc I’ve to determine what to do with that info. That’s comparatively fast work, although.


    Our Use Case — Convey on Flu Season:

    I work at an organization known as IntelyCare. We function within the healthcare staffing area, which suggests we assist hospitals, nursing houses, and rehab facilities discover high quality clinicians for particular person shifts, prolonged contracts, or full-time/part-time engagements. 

    A lot of our services require clinicians to have an up-to-date flu shot. Final 12 months, our clinicians submitted over 10,000 flu photographs along with tons of of 1000’s of different paperwork. We manually reviewed all of those manually to make sure validity. A part of the enjoyment of working within the healthcare staffing world!

    Spoiler Alert: Utilizing Doc AI, we have been capable of scale back the variety of flu-shot paperwork needing handbook overview by ~50% and all in simply a few weeks.

    To drag this off, we did the next:

    • Uploaded a pile of flu-shot paperwork to snowflake.
    • Massaged the prompts, educated the mannequin, massaged the prompts some extra, retrained the mannequin some extra… 
    • Constructed out the logic to check the mannequin output in opposition to the clinician’s profile (e.g. do the names match?). Positively some trial and error right here with formatting names, dates, and many others.
    • Constructed out the “choice logic” to both approve the doc or ship it again to the people.
    • Examined the total pipeline on larger pile of manually reviewed paperwork. Took a detailed take a look at any false positives.
    • Repeated till our confusion matrix was passable.

    For this venture, false positives pose a enterprise threat. We don’t need to approve a doc that’s expired or lacking key info. We stored iterating till the false-positive fee hit zero. We’ll have some false positives ultimately, however fewer than what we’ve now with a human overview course of.

    False negatives, nevertheless, are innocent. If our pipeline doesn’t like a flu shot, it merely routes the doc to the human workforce for overview. In the event that they go on to approve the doc, it’s enterprise as common.

    The mannequin does properly with the clear/simple paperwork, which account for ~50% of all flu photographs. If it’s messy or complicated, it goes again to the people as earlier than. 


    Issues we discovered alongside the way in which

    1. The mannequin does finest at studying the doc, not making selections or doing math based mostly on the doc.

    Initially, our prompts tried to find out validity of the doc.

    Unhealthy: Is the doc already expired?

    We discovered it far more practical to restrict our prompts to questions that might be answered by trying on the doc. The LLM doesn’t decide something. It simply grabs the related knowledge factors off the web page.

    Good: What’s the expiration date? 

    Save the outcomes and do the maths downstream.

    1. You continue to have to be considerate about coaching knowledge

    We had a couple of duplicate flu photographs from one clinician in our coaching knowledge. Name this clinician Ben. One in every of our prompts was, “what’s the affected person’s title?” As a result of “Ben” was within the coaching knowledge a number of occasions, any remotely unclear doc would return with “Ben” because the affected person title.

    So overfitting remains to be a factor. Over/beneath sampling remains to be a factor. We tried once more with a extra considerate assortment of coaching paperwork and issues did a lot better.

    Doc AI is fairly magical, however not that magical. Fundamentals nonetheless matter.

    1. The mannequin might be fooled by writing on a serviette.

    To my information, Snowflake doesn’t have a solution to render the doc picture as an embedding. You possibly can create an embedding from the extracted textual content, however that received’t inform you if the textual content was written by hand or not. So long as the textual content is legitimate, the mannequin and downstream logic will give it a inexperienced mild.

    You might repair this beautiful simply by evaluating picture embeddings of submitted paperwork to the embeddings of accepted paperwork. Any doc with an embedding means out in left discipline is distributed again for human overview. That is easy work, however you’ll need to do it exterior Snowflake for now. 

    1. Not as costly as I used to be anticipating 

    Snowflake has a repute of being spendy. And for HIPAA compliance considerations we run a higher-tier Snowflake account for this venture. I have a tendency to fret about working up a Snowflake tab.

    Ultimately we needed to strive further laborious to spend greater than $100/week whereas coaching the mannequin. We ran 1000’s of paperwork by the mannequin each few days to measure its accuracy whereas iterating on the mannequin, however by no means managed to interrupt the funds.

    Higher nonetheless, we’re saving cash on the handbook overview course of. The prices for AI reviewing 1000 paperwork (approves ~500 paperwork) is ~20% of the fee we spend on people reviewing the remaining 500. All in, a 40% discount in prices for reviewing flu-shots.


    Summing up

    I’ve been impressed with how rapidly we might full a venture of this scope utilizing Doc AI. We’ve gone from months to days. I give it 4 stars out of 5, and am open to giving it a fifth star if Snowflake ever offers us entry to picture embeddings. 

    Since flu photographs, we’ve deployed comparable fashions for different paperwork with comparable or higher outcomes. And with all this prep work, as an alternative of dreading the upcoming flu season, we’re able to deliver it on.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleBuilding ETL Pipelines for Machine Learning Using PySpark: A Comprehensive Guide | by Orami | Apr, 2025
    Next Article Starbucks Introduces a Strict New Dress Code for Baristas
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    Inside Google’s Agent2Agent (A2A) Protocol: Teaching AI Agents to Talk to Each Other

    June 3, 2025
    Artificial Intelligence

    Vision Transformer on a Budget

    June 3, 2025
    Artificial Intelligence

    LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries

    June 3, 2025
    Add A Comment

    Comments are closed.

    Top Posts

    How to Invest in the Growth of Your Business Despite An Uncertain Economy

    May 13, 2025

    3.6 Million Patents Were Filed in 2023 Alone — This Is How the Most Successful Ones Got Approved

    April 9, 2025

    When To Cut Your Financial Losses When Paying For Service

    March 24, 2025

    With AI, researchers predict the location of virtually any protein within a human cell | MIT News

    May 15, 2025

    Novel method detects microbial contamination in cell cultures | MIT News

    April 26, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Evo 2 by Arc Institute & NVIDIA: A Breakthrough in Genomic AI | by U V | Feb, 2025

    February 22, 2025

    Toward AGI: AI Innovation Will Be Driven by Applications, Not LLMs

    February 14, 2025

    Cognitive Stretching in AI: How Specific Prompts Change Language Model Response Patterns | by Response Lab | Jun, 2025

    June 3, 2025
    Our Picks

    Time Series Forecasting Made Simple (Part 1): Decomposition and Baseline Models

    April 9, 2025

    Survey: 84% Say AI Won’t Replace Low-Code and No-Code Tools

    March 28, 2025

    Pattie Maes receives ACM SIGCHI Lifetime Research Award | MIT News

    March 31, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.