The foreach activity in Azure data factory is one of the most useful activities. It gives ADF the power to run generic actions against multiple objects. for example, to copy multiple tables from source to target, based on a table list.
But if you want to use variables to store the values changed between iterations be careful. You might not get what you expect.
To demonstrate this, I'll create a simple pipeline that copies rows from one SQL table to another.
My rows are just some letters:
And my pipeline is built like this:
a lookup activity to retrieve my source table
a foreach activity to run on each row from the lookup (5 iterations)
inside the foreach activity - set the letter to a variable
inside the foreach activity - write the letter to the destination table
My destination table should look just like the source table, right?
Running the pipeline successfully, and these are the results:
What just happened?
Unlike traditional programming languages, where loop iterations run in serial (one after the other), the foreach activity in data factory runs in parallel. This is done to boost performance so that activities will run at the same time and the pipeline will finish faster.
But that might cause an issue if you assign values and then use these values in variables. There is only one instance of the variable and each loop iteration will change it at the same time, which will cause unexpected values. In the example above, each iteration changed the value of the variable to the respective letter, until the last one changed it to "E", and on the next activity "E" was written to the destination table on all iterations.
How can we work around this behavior?
Option 1- use serial run
You can force the foreach activity to run in serial instead of parallel and then each variable change will occur before the table write, with no mix-up.
This pipeline will, of course, run slower.
To do that, stand on the foreach activity and click settings, then check the "sequential" box.
Option 2 - Don't use variables
You can use the value return from the iteration directly, without going through a variable.
To do that, where you want to use the variable, simply use item().field (field being the name of the column).
for example, on the write to SQL activity:
Instead of using:
@concat('insert into dbo.destination_letters(letter) values(',
'''',
variables('v_letter'),
''')')
Use:
@concat('insert into dbo.destination_letters(letter) values(',
'''',
item().letter,
''')')
So, variables are very useful, and foreach is very useful, just make sure they play nicely with one another.
Comments