You are getting duplicates or weird results? Here, we will explain to you how to set correctly your Generic Aggregation Step.
Introduction
Enriching data from different sources can be a complex process, but there are powerful tools to streamline this task. The "Generic Aggregation" is one such tool that can greatly simplify the data aggregation process. In this article, we will explain how to use this feature following best practices, based on the example of the "Extract People Profile" and "Extract Company Profile" steps.
Best Practices
First check, before using the Generic Aggregation:
Ensure that you have a key data that matches between the two steps you want to aggregate data:
-
- Do not hesitate to first launch a small batch to check the outputs of each step to identify a column that could serve as a common key data.
Using an "id" is generally preferable for this purpose.
- Sometimes, two columns may have the same name but contain different data. For example, "profile_linkedin_id" may differ from an automation to another."
Make sure to use the appropriate column to avoid aggregation errors.
- Do not hesitate to first launch a small batch to check the outputs of each step to identify a column that could serve as a common key data.
NOTE: It is not necessary for the column names to be the same, but it is important that the data inside matches. Therefore, do not hesitate to check the results of each output and filter by column for better visibility.
If you can't find any columns that match together, unfortunately, you cannot use the Generic Aggregation.
If the data match:
Select the closest step to Generic Aggregation in FOR EACH, and the other one, in COMBINE:
Let's take a simple example with the LinkedIn People Profile and LinkedIn Company Profile steps, to obtain People Profiles enriched with company information. You should follow this order:
- Set "LinkedIn Company Profile" as "FOR EACH"
- Set "LinkedIn People Profile" as "COMBINE"
This way, you will only have "People Profiles" that have an associated company.
Please note when aggregating Company to People Profile:
It is essential to note that certain companies may not be accessible on some LinkedIn profiles, even if they are indeed associated with the profile's company. This may be a limitation on LinkedIn's side. When manually verifying, it might happen that the current company cannot be clicked, even though it exists.
Check out this article for more information: Discrepancies on the output from a step to another
Conclusion
By using Generic Aggregation appropriately and following this best practice, you can simplify the data enrichment process from various sources. Keep in mind the potential limitations of LinkedIn (if LinkedIn steps are used) and ensure that you use the correct columns for data aggregation.
By following these guidelines, you can optimize your data enrichment process and obtain accurate and relevant results.
Well, you know now how to set your Generic Aggregation! Use the same practice for different steps. :)