Even though we have a detailed article on how to launch a workflow with the API and a concise guide, we'll deep dive into a definitive guide on how to completely embed Captain Data into your product using our API.
There are a few things to take into account when running automation:
- First, it won't be a surprise: automation sometimes break
- You also have to handle different sets of limits depending on the apps and API rate limiting
- Splitting jobs (runs) into multiple batches to avoid handling hundreds of thousands of lines
Long story short, you need a robust orchestration system, and that's what we aim to provide.
Overall Architecture to Launch & Get Job Results
Nothing fancy in this process, the only thing to keep in mind is that we batch-process everything.
There are multiple reasons for this:
- We can handle hundreds of thousands of lines for a job
- We need to apply limits for each account to make sure workflows run safely
- We need to comply with rate limiting
This implies that you use Webhooks whenever you need to ingest a workflow's results.
Webhook logic
Each time a webhook fires - on success or failure - you will receive a job_uid.
You can use this job_uid to trigger the GET endpoint Job Results:
https://api.captaindata.co/v3/jobs/:job_uid/results
As you can see, you only need to provide the job_uid in the URL route to get the first 1000 results.
You can then use our pagination system, either by:
- using the page query parameter (1 by default)
- using the paging object returned in the API
By using the paging object you'll get a previous, next that you'll be able to parse to recursively get the results - in the end, it all depends on how your coding style!
Storing the Job UID
Since you never what can happen and in some cases, you might need to debug, you should probably attach the source job_uid to the object you've extracted.
Let's say that I'm pulling data from LinkedIn, and something goes wrong, if you didn't store the job_uid as a reference, you'll have to trace the API calls you made to find the source of your issue. If you already have it... :)
Workflows schemas
Although Captain Data is not an ETL, we do provide an endpoint to get the workflow's schema.
It'll be enhanced in Q2/2024 with additional typings.
https://api.captaindata.co/v3/workflows/:workflow_uid/details
Using this endpoint you'll receive, amongst others:
- the input_schema which indicates how to
- the output_schema listing all the keys you can implement in your workflow
Note that the output_schema is not always accurate and subject to variation, if you have a doubt, launch a workflow on the UI and check the keys returned.
If you're having trouble integrating the data, don't hesitate to reach out to your main POC.
A word on important data points
This won't be an exhaustive list, though here are few advice.
LinkedIn Company & People Profiles
Whether you're extracting data from people or companies, you should store linkedin_profile_id and linkedin_company_id.
They're the unique, immutable IDs for both objects.
Note that while useful for de-duplication or to keep as a reference, you can't use the linkedin_profile_id to fetch a profile.
However, you can use a sales_navigator_profile_id to generate a LinkedIn profile URL.
Don't ask us why, we don't know, it just works :) That's good news however, since it means you don't always need LinkedIn handles, which you can't extract from a Sales Navigator Search, for example.
LinkedIn Messages & Conversations
The most important for LinkedIn Conversations is the unique thread_id.
With this key, you can retrieve the messages for a specific conversation.