Close Menu
    Trending
    • How AI Is Transforming Creative Industries: From Art to Music to Writing | by AI With Lil Bro | May, 2025
    • How You’ll Feel Reaching Various Millionaire Milestones ($1-$20M)
    • How to Unlock Your Brand’s Potential Through Engaging Content
    • Uh-Uh, Not Guilty | Towards Data Science
    • How Spark Actually Works: Behind the Curtain of Your First .show() | by B V Sarath Chandra | May, 2025
    • Best Practices for Managing a Virtual Medical Receptionist
    • How I Built a Bulletproof Portfolio (And What Most People Get Wrong)
    • A Practical Guide to BERTopic for Transformer-Based Topic Modeling
    Finance StarGate
    • Home
    • Artificial Intelligence
    • AI Technology
    • Data Science
    • Machine Learning
    • Finance
    • Passive Income
    Finance StarGate
    Home»Artificial Intelligence»Generating Data Dictionary for Excel Files Using OpenPyxl and AI Agents
    Artificial Intelligence

    Generating Data Dictionary for Excel Files Using OpenPyxl and AI Agents

    FinanceStarGateBy FinanceStarGateMay 8, 2025No Comments11 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Introduction

    Each firm I labored for till as we speak, there it was: the resilient MS Excel.

    Excel was first launched in 1985 and has remained robust till as we speak. It has survived the rise of relational databases, the evolution of many programming languages, the Web with its infinite variety of on-line functions, and at last, additionally it is surviving the period of the AI.

    Phew!

    Do you’ve gotten any doubts about how resilient Excel is? I don’t.

    I feel the rationale for that’s its practicality to begin and manipulate a doc shortly. Take into consideration this case: we’re at work, in a gathering, and all of the sudden the management shares a CSV file and asks for a fast calculation or a number of calculated numbers. Now, the choices are:

    1. Open an IDE (or a pocket book) and begin coding like loopy to generate a easy matplotlib graphic;

    2. Open Energy BI, import the info, and begin making a report with dynamic graphics.

    3. Open the CSV in Excel, write a few formulation, and create a graphic.

    I can’t communicate for you, however many occasions I’m going for choice 3. Particularly as a result of Excel recordsdata are suitable with all the things, simply shareable, and beginner-friendly.

    I’m saying all of this as an Introduction to make my level that I don’t suppose that Excel recordsdata are going away anytime quickly, even with the quick improvement of AI. Many will love that, many will hate that.

    So, my motion right here was to leverage AI to make Excel recordsdata higher documented. One of many foremost complaints of information groups about Excel is the dearth of greatest practices and reproducibility, provided that the names of the columns can have any names and knowledge varieties, however zero documentation.

    So, I’ve created an AI Agent that reads the Excel file and creates this small documentation. Right here is the way it works:

    1. The Excel file is transformed to CSV and fed into the Giant Language Mannequin (LLM).
    2. The AI Agent generates the info dictionary with column data (variable identify, knowledge kind, description).
    3. The info dictionary will get added as feedback to the Excel file’s header.
    4. Output file saved with feedback.

    Okay. Palms-on now. Let’s get that executed on this tutorial.

    Code

    Let’s code! | Picture generated by AI. Meta Llama, 2025. https://meta.ai

    We’ll start by organising a digital atmosphere. Create a venv with the instrument of your selection, akin to Poetry, Python Venv, Anaconda, or UV. I actually like UV, as it’s the quickest and the only, for my part. When you’ve got UV put in [5], open a terminal and create your venv.

    uv init data-docs
    cd data-docs
    uv venv
    uv add streamlit openpyxl pandas agno mcp google-genai

    Now, allow us to import the mandatory modules. This venture was created with Python 3.12.1, however I imagine Python 3.9 or larger would possibly do the trick already. We’ll use:

    • Agno: for the AI Agent administration
    • OpenPyxl: for the manipulation of Excel recordsdata
    • Streamlit: for the front-end interface.
    • Pandas, OS, JSON, Dedent and Google Genai as help modules.
    # Imports
    import os
    import json
    import streamlit as st
    from textwrap import dedent
    
    from agno.agent import Agent
    from agno.fashions.google import Gemini
    from agno.instruments.file import FileTools
    
    from openpyxl import load_workbook
    from openpyxl.feedback import Remark
    import pandas as pd

    Nice. The following step is creating the capabilities we’ll have to deal with the Excel recordsdata and to create the AI Agent.

    Discover that each one the capabilities have detailed docstrings. That is intentional as a result of LLMs use docstrings to know what a given perform does and determine whether or not to make use of it or not as a instrument.

    So, when you’re utilizing Python capabilities as Instruments for an AI Agent, be certain to make use of detailed docstrings. These days, with free copilots akin to Windsurf [6] it’s even simpler to create them.

    Changing the file to CSV

    This perform will:

    • Take the Excel file and browse solely the primary 10 rows. That is sufficient for us to ship to the LLM. Doing that, we’re additionally stopping sending too many tokens as enter and making this agent too costly.
    • Save the file as CSV to make use of as enter for the AI Agent. The CSV format is simpler for the mannequin to soak up, as it’s a bunch of textual content separated by commas. And we all know LLMs shine working with textual content.

    Right here is the perform.

    def convert_to_csv(file_path:str):
       """
        Use this instrument to transform the excel file to CSV.
    
        * file_path: Path to the Excel file to be transformed
        """
       # Load the file  
       df = pd.read_excel(file_path).head(10)
    
       # Convert to CSV
       st.write("Changing to CSV... :leftwards_arrow_with_hook:")
       return df.to_csv('temp.csv', index=False)

    Let’s transfer on.

    Creating the Agent

    The following perform creates the AI agent. I’m utilizing Agno [1], as it is rather versatile and simple to make use of. I additionally selected the mannequin Gemini 2.0 Flash. Throughout the check part, this was the best-performing mannequin producing the info docs. To make use of it, you’ll need an API Key from Google. Don’t overlook to get one right here [7].

    The perform:

    • Receives the CSV output from the earlier perform.
    • Passes by means of the AI Agent, which generates the info dictionary with column identify, description, and knowledge kind.
    • Discover that the description argument is the immediate for the agent. Make it detailed and exact.
    • The info dictionary shall be saved as a JSON file utilizing a instrument referred to as FileTools that may learn and write recordsdata.
    • I’ve arrange retries=2 so we are able to work round any error on a primary strive.
    def create_agent(apy_key):
        agent = Agent(
            mannequin=Gemini(id="gemini-2.0-flash", api_key=apy_key),
            description= dedent("""
                                You might be an agent that reads the temp.csv dataset offered to you and 
                                based mostly on the identify and knowledge kind of every column header, decide the next data:
                                - The info kinds of every column
                                - The outline of every column
                                - The primary column numer is 0
    
                                Utilizing the FileTools offered, create a knowledge dictionary in JSON format that features the beneath data:
                                {<ColNumber>: {ColName: <ColName>, DataType: <DataType>, Description: <Description>}}
    
                                In case you are unable to find out the info kind or description of a column, return 'N/A' for that column for the lacking values.
                                
                                """),
            instruments=[ FileTools(read_files=True, save_files=True) ],
            retries=2,
            show_tool_calls=True
            )
    
        return agent
    

    Okay. Now we’d like one other perform to save lots of the info dictionary to the file.

    Including Information Dictionary to the File’s Header

    That is the final perform to be created. It would:

    • Get the info dictionary json from the earlier step and the unique Excel file.
    • Add the info dictionary to the file’s header as feedback.
    • Save the output file.
    • As soon as the file is saved, it shows a obtain button for the consumer to get the modified file.
    def add_comments_to_header(file_path:str, data_dict:dict="data_dict.json"):
        """
        Use this instrument so as to add the info dictionary {data_dict.json} as feedback to the header of an Excel file and save the output file.
    
        The perform takes the Excel file path as argument and provides the {data_dict.json} as feedback to every cell
        Begin counting from column 0
        within the first row of the Excel file, utilizing the next format:    
            * Column Quantity: <column_number>
            * Column Identify: <column_name>
            * Information Kind: <data_type>
            * Description: <description>
    
        Parameters
        ----------
        * file_path : str
            The trail to the Excel file to be processed
        * data_dict : dict
            The info dictionary containing the column quantity, column identify, knowledge kind, description, and variety of null values
    
        """
        
        # Load the info dictionary
        data_dict = json.load(open(data_dict))
    
        # Load the workbook
        wb = load_workbook(file_path)
    
        # Get the energetic worksheet
        ws = wb.energetic
    
        # Iterate over every column within the first row (header)
        for n, col in enumerate(ws.iter_cols(min_row=1, max_row=1)):
            for header_cell in col:
                header_cell.remark = Remark(dedent(f"""
                                  ColName: {data_dict[str(n)]['ColName']}, 
                                  DataType: {data_dict[str(n)]['DataType']},
                                  Description: {data_dict[str(n)]['Description']}
        """),'AI Agent')
    
        # Save the workbook
        st.write("Saving File... :floppy_disk:")
        wb.save('output.xlsx')
    
        # Create a obtain button
        with open('output.xlsx', 'rb') as f:
            st.download_button(
                label="Obtain output.xlsx",
                knowledge=f,
                file_name='output.xlsx',
                mime='utility/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
            )
    

    Okay. The following step is to attach all of this collectively on a Streamlit front-end script.

    Streamlit Entrance-Finish

    On this step, I might have created a special file for the front-end and imported the capabilities in there. However I made a decision to make use of the identical file, so let’s begin with the well-known:

    if __name__ == "__main__":

    First, a few traces to configure the web page and messages displayed within the Internet Software. We’ll use the content material centered on the web page, and there may be some details about how the App works.

    # Config web page Streamlit
        st.set_page_config(structure="centered", 
                           page_title="Information Docs", 
                           page_icon=":paperclip:",
                           initial_sidebar_state="expanded")
        
        # Title
        st.title("Information Docs :paperclip:")
        st.subheader("Generate a knowledge dictionary in your Excel file.")
        st.caption("1. Enter your Gemini API key and the trail of the Excel file on the sidebar.")
        st.caption("2. Run the agent.")
        st.caption("3. The agent will generate a knowledge dictionary and add it as feedback to the header of the Excel file.")
        st.caption("ColName: <ColName> | DataType: <DataType> | Description: <Description>")
        
        st.divider()

    Subsequent, we’ll arrange the sidebar, the place the consumer can enter their API Key from Google and choose a .xlsx file to be modified.

    There’s a button to run the appliance, one other to reset the app state, and a progress bar. Nothing too fancy.

    with st.sidebar:
            # Enter your API key
            st.caption("Enter your API key and the trail of the Excel file.")
            api_key = st.text_input("API key: ", placeholder="Google Gemini API key", kind="password")
            
            # Add file
            input_file = st.file_uploader("File add", 
                                           kind='xlsx')
            
    
            # Run the agent
            agent_run = st.button("Run")
    
            # progress bar
            progress_bar = st.empty()
            progress_bar.progress(0, textual content="Initializing...")
    
            st.divider()
    
            # Reset session state
            if st.button("Reset Session"):
                st.session_state.clear()
                st.rerun()

    As soon as the run button is clicked, it triggers the remainder of the code to run the Agent. Right here is the sequence of steps carried out:

    1. The primary perform known as to rework the file to CSV
    2. The progress is registered on the progress bar.
    3. The Agent is created.
    4. Progress bar up to date.
    5. A immediate is fed into the agent to learn the temp.csv file, create the info dictionary, and save the output to data_dictionary.json.
    6. The info dictionary is printed on the display, so the consumer can see what was generated whereas it’s being saved to the Excel file.
    7. The Excel file is modified and saved.
    # Create the agent
        if agent_run:
            # Convert Excel file to CSV
            convert_to_csv(input_file)
    
            # Register progress
            progress_bar.progress(15, textual content="Processing CSV...")
    
            # Create the agent
            agent = create_agent(api_key)
    
            # Begin the script
            st.write("Operating Agent... :runner:")
    
            # Register progress
            progress_bar.progress(50, textual content="AI Agent is operating...")
    
            # Run the agent    
            agent.print_response(dedent(f"""
                                    1. Use FileTools to learn the temp.csv as enter to create the info dictionary for the columns within the dataset. 
                                    2. Utilizing the FileTools instrument, save the info dictionary to a file named 'data_dict.json'.
                                    
                                    """),
                            markdown=True)
    
            # Print the info dictionary
            st.write("Producing Information Dictionary... :page_facing_up:")
            with open('data_dict.json', 'r') as f:
                data_dict = json.load(f)
                st.json(data_dict, expanded=False)
    
            # Add feedback to header
            add_comments_to_header(input_file, 'data_dict.json')
    
            # Take away short-term recordsdata
            st.write("Eradicating short-term recordsdata... :wastebasket:")
            os.take away('temp.csv')
            os.take away('data_dict.json')    
        
        # If file exists, present success message
        if os.path.exists('output.xlsx'):
            st.success("Accomplished! :white_check_mark:")
            os.take away('output.xlsx')
    
        # Progress bar finish
        progress_bar.progress(100, textual content="Accomplished!")

    That’s it. Here’s a demonstration of the agent in motion.

    Information Docs added to your Excel File. Picture by the writer.

    Stunning consequence!

    Strive It

    You possibly can strive the deployed app right here: https://excel-datadocs.streamlit.app/

    Earlier than You Go

    In my humble opinion, Excel recordsdata will not be going away anytime quickly. Loving or hating them, we’ll have to stay with them for some time.

    Excel recordsdata are versatile, simple to deal with and share, thus they’re nonetheless very helpful for the routine ad-hoc duties at work.

    Nonetheless, now we are able to leverage AI to assist us deal with these recordsdata and make them higher. Artificial Intelligence is touching so many factors of our lives. The routine and instruments at work are solely one other one.

    Let’s benefit from AI and work smarter daily!

    When you appreciated this content material, discover extra of my work in my web site and GitHub, shared beneath.

    GitHub Repository

    Right here is the GitHub Repository for this venture.

    https://github.com/gurezende/Data-Dictionary-GenAI

    Discover Me

    You’ll find extra about my work on my web site.

    https://gustavorsantos.me

    References

    [1. Agno Docs] https://docs.agno.com/introduction/agents

    [2. Openpyxl Docs] https://openpyxl.readthedocs.io/en/stable/index.html

    [3. Streamlit Docs] https://docs.streamlit.io/

    [4. Data-Docs Web App] https://excel-datadocs.streamlit.app/

    [5. Installing UV] https://docs.astral.sh/uv/getting-started/installation/

    [6. Windsurf Coding Copilot] https://windsurf.com/vscode_tutorial

    [7. Google Gemini API Key] https://ai.google.dev/gemini-api/docs/api-key

    The put up Generating Data Dictionary for Excel Files Using OpenPyxl and AI Agents appeared first on Towards Data Science.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleExpert AI and Machine Learning Dissertation Writing and Coding Services for UK Students | by Tutorsindia | May, 2025
    Next Article 5 Ways CEOs Can Assess and Reset Their Company Culture
    FinanceStarGate

    Related Posts

    Artificial Intelligence

    Uh-Uh, Not Guilty | Towards Data Science

    May 8, 2025
    Artificial Intelligence

    A Practical Guide to BERTopic for Transformer-Based Topic Modeling

    May 8, 2025
    Artificial Intelligence

    Real-Time Interactive Sentiment Analysis in Python

    May 8, 2025
    Add A Comment
    Leave A Reply Cancel Reply

    Top Posts

    Yum! Brands Brings AI to Drive-Thrus With Nvidia Partnership

    March 19, 2025

    If You’re Not Using Chatbots, You’re Failing Your Customers

    April 13, 2025

    09337624612

    April 6, 2025

    Let’s Call a Spade a Spade: RDF and LPG — Cousins Who Should Learn to Live Together

    April 7, 2025

    Reducing Time to Value for Data Science Projects: Part 1

    May 1, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    Most Popular

    Three Lovely Projects And One Failure | by Shmulik Cohen | Mar, 2025

    March 17, 2025

    Massachusetts Launches Flagship AI Models Innovation Challenge, $2M in Funding

    February 14, 2025

    Customizing generative AI for unique value

    March 4, 2025
    Our Picks

    Gen Z Workers Stream Movies, Shows, While Working: Report

    April 1, 2025

    Integrity Sense-Checking Your AI Tools and Machine Learning Models to Reduce AI Hallucinations | by Katrina Young | Apr, 2025

    April 23, 2025

    Meta Has Block Lists of Ex-Employees It Won’t Rehire

    March 7, 2025
    Categories
    • AI Technology
    • Artificial Intelligence
    • Data Science
    • Finance
    • Machine Learning
    • Passive Income
    • Privacy Policy
    • Disclaimer
    • Terms and Conditions
    • About us
    • Contact us
    Copyright © 2025 Financestargate.com All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.