Product Scrapper
Learning how to leverage secrets and images on Tensorlake Serverless.
In this tutorial, we will:
- Create a Tensorlake Graph
- Test Locally
- Define Dependencies and Secrets
- Deploy to Tensorlake Serverless
- Invoke the Graph Remotely
- Troubleshoot Remote Executions
Let’s create a simple workflow that scrapes an e-commerce product page, summarizes the product details, and extracts some structured information about the product.
Prerequisites
Before proceeding, ensure you have the following:
- Python Environment: Python 3.9 or higher installed.
- Tensorlake Account: Sign up at Tensorlake.
- API Key: After creating your account, generate an API key for the Tensorlake CLI and set it as an environment variable:
- Tensorlake SDK: Install the Tensorlake SDK using pip:
- OpenAI API Key: Can be created at OpenAI.
Step 1: Writing the Graph
In workflow.py
, we write three functions:
scrape_website
will leverage https://jina.ai/reader/ to parse websites into text.summarize_text
will leverage OpenAI’s chatgpt to summarize the text outputted fromscrape_website
.extract_structured_data
will leverage OpenAI’s chatgpt to extract structured data defined as a Python class from the text outputted fromscrape_website
.
The Graph website-summarizer
executes the scrape_website
function, and then executes both the summarize_text
and extract_structured_data
in parallel with the output of the scraper.
Step 2: Test Locally
Before running the code locally, we need to ensure all the dependencies of the graph are available
locally. For this graph, we need to have to run pip install openai
to install the OpenAI SDK.
Additionally, the OpenAI SDK requires the OPENAI_API_KEY
environment variable:
Once the dependencies and secrets are available, add the following code to enable running the graph locally:
Running python workflow.py
will execute the workflow locally and print the outputs. There are two print statements
for this graph: one for the text summarization text and one for the structured extraction.
Step 3: Define Dependencies and Secrets
The current version of the graph requires some Python dependencies and some environment variables containing secrets.
Tensorlake Serverless provides Images and Secrets to define what a tensorlake_function
requires when running on the Tensorlake Cloud.
Dependencies
With Tensorlake Serverless, every function runs in its own sandbox defined via images. We define two images that we associate with the function that requires them.
As part of adding an image attribute to the tensorlake_function
decorator, we also moved imports within each function.
This allows creating smaller per-function images without needing to have all the dependencies in all images therefore reducing cold-start when big dependencies are needed like AI models.
Secrets
The graph requires the presence of the OPENAI_API_KEY
environment variable containing a sensitive value.
Tensorlake Serverless provides the concept of secrets that are injected at runtime into functions depending on them.
Secrets are encrypted and only decrypted to be injected into functions.
Create the tensorlake secret using the Tensorlake CLI:
Change the function requiring the OpenAI API Key so that Tensorlake Serverless can inject the value at runtime:
Every remote invocation will now use the value of the secret we created when running the summarize_text
and extract_structured_data
functions.
Step 4: Deploying the Graph
The graph can be deployed as a remote API on Tensorlake Cloud, and can be called from any application on-demand.
This process will create a new image capable of running your functions, and deploy the graph as a remote API.
Step 5: Invoking the Graph Remotely
Once the graph is deployed, you can invoke it remotely by modifying the main code:
Alternatively, you can obtain a reference to the deployed graph and invoke it:
The Graph is called with the input of the starting node of the graph, in this case scrape_website
, so
the input to the graph is the url
parameter.
The result of calling a graph is an Invocation
. Since data applications can take a long time to complete,
calling outputs
on an invocation will wait for the invocation to be complete.
In either case, the result of the individual functions can be retrieved using the invocation id, and the name of the function.
Step 6: Monitoring and Troubleshooting
Monitor your graph’s invocations and logs using the Tensorlake CLI:
These commands help you track executions and diagnose any issues that may arise during remote invocations.
Was this page helpful?