There are certain situations in FileMaker development where the current found set needs to be recreated. A good example of this is when the Perform Script On Server (PSoS) step is used. Since it runs on the server, the current context (including the found set) is lost and has to be reconstructed.
I used to recreate found sets on the server by collecting the primary key values using a custom function and then creating a find request for each value in the list, but this approach does not scale well.
# Records
Time (seconds)
100
<1
1,000
2
10,000
28
I was curious about other methods. Greg Lane wrote an excellent article on restoring found sets using snapshot links: Restoring a FileMaker Client’s Found Set Within A Server-Side Script. Another technique involves temporarily storing the IDs in a utility field and then using the Go To Related Record (GTRR) step.
I was curious about how these methods compared, so I did a bit of testing, but before we get to the results, the different methods merit a bit more discussion.
Step One: Collect Values In Found Set
In general, there are two steps involved: the first is to collect the IDs of the records in the found set, and the second is to recreate the record set.
I used to use a custom function to collect values from a field for all records in the found set. One way to write such a custom function is:
This approach has two limitations: The first is the 10,000 recursion limit that FileMaker has for custom functions when doing additive recursions. (The limit for tail recursions is 50,000.) The second is that it’s slow when the size of the found set is large.
Instead of doing this work via a custom function, we could loop through the records and collect the values. This saves us from the recursion limit but does not gain us any speed improvements (and in fact may be even slower).
FileMaker 13 introduced the ‘List of’ summary field, which is a much faster way of collecting field values from found sets. However, this approach requires you to set up a new summary field for every field whose values you want to collect.
The snapshot link affords us with yet another approach, although in this case what is collected is the internal record IDs. However, snapshot links are created even more quickly than ‘List of’ summary fields are evaluated.
Step One Summary
Method
Speed
Custom function
Recursion limit, slow with large found sets
Loop through records
Slow with large found sets
‘List of’ summary field
Fast
Snapshot link
Fast, even with really large found sets
Step Two: Recreate Found Set
The second step boils down to two alternatives: using Perform Find or GTRR. But as usual, there are some caveats.
List of Find
If we are collecting the primary key values, the fastest way to do so is using the ‘List of’ summary field. Once we have this list of IDs, we can create a find request for each one and then do a Perform Find to restore the found set. However, this approach is quite slow when working with found sets that are bigger than a few thousand.
In my testing, I refer to this method as “List of Find”.
GTRR
Instead of creating find requests and using Perform Find, we can temporarily store the list of IDs and use the Go To Related Record (GTRR) step to recreate the found set. However, as it turns out, it makes a difference if the ID is a number field or if it’s a text field. A number ID is typically set up to use the auto-enter serial setting, and a text ID typically uses the Get ( UUID ) function. When working with found sets that are larger than just a few thousand records, the GTRR step performs considerably more quickly if the ID is a number field. (And both varieties perform much, much more quickly than the List of Find method.)
In my testing, I refer to this method and its two variations as “GTRR (number)” and “GTRR (text)”.
Snapshot Find
If we use a snapshot link, then we can (very quickly) get a list of the internal record IDs. However, to make use of this list, we have to set up a stored calculation field that returns Get ( RecordID ). The basic sequence of this technique is as follows:
Create the snapshot link file
Insert the file into a global field
Parse the contents of the file to get the internal record ID list
Send the list to the server
Create find requests using the stored calc field
Perform Find
Snapshot Find: How many find requests?
Suppose we have a table with 10 records whose internal record IDs are 1 through 10, and suppose that our found set contains these eight records: 1, 2, 3, 5, 7, 8, 9, and 10. (Records 4 and 6 are missing.) The snapshot link would describe such a found set as follows:
1-357-10
So instead of having to make eight separate find requests, we now only have to make three find requests.
In the best case, the found set consists entirely of records whose internal record IDs constitute a single continuous sequence. In this case, we only have to make a single find request.
In the worst case, the found set has no continuous sequence, and we have to make a separate find request for each ID. In my testing, I construct a found set like this by only including records with odd internal record ID values.
In my testing, I refer to this method and its two variations as “Snapshot Find (best case)” and “Snapshot Find (worst case)”.
Step Two Summary
Method
Saving the Found Set
Restoring the Found Set
Considerations
List of Find
‘List of’ summary
Perform Find
—
GTRR
‘List of’ summary
GTRR
Number vs. text
Snapshot Find
Snapshot link
Perform Find
Best case vs. worst case
Test Results
So how do the methods compare?
As I alluded to already, the List of Find method is quite slow. The GTRR method is fast, with number IDs performing considerably faster than text. The Snapshot Find method performs the fastest when the found set is configured according to the ‘best case’, requiring just a single find request. But when the found set is set up according to the ‘worst case’, the performance is comparable to the List of Find method. (It’s a bit faster, because getting the list of internal record IDs from the snapshot is faster, but it’s still brutally slow.)
The answer is a bit unsatisfying, because for the Snapshot Find method, the data is shown for the ‘best case’ and the ‘worst case’, and not for the ‘typical case’. But the typical case would be difficult to reliably reconstruct in a test environment, so I had to resort to a best/worst-case type of analysis.
The test is set up to try each method multiple times, increasing the size of the found set with each iteration. Here are the results for found sets from 500 to 10,000 (in 500 increments):
Here is the same data, with the List of Find and Snapshot Find (worst case) excluded:
Below are the results for found sets from 10,000 to 100,000 (in 10,000 increments).
The test is set up to break out of the loop if a single duration lasts longer than a specified limit – in this case 20 seconds. That is why there is only a single data point for the List of Find and Snapshot Find (worst case) methods.
The same pattern seems to hold. Snapshot Find (best case) is still the fastest. GTRR is in the middle, with the number ID being faster. And the List of Find and Snapshot Find (worst case) are very slow.
Here are the results from testing found sets that are quite large and therefore probably fairly atypical. The increment levels were set differently for each method, and, for each method, the test was set to stop either after a single iteration exceeded 60 seconds or after it had completed 50 iterations. The same pattern seems to continue, although there is quite a bit of variation with the GTRR (text) method.
The GTRR (text) variation may have been due to latency hiccups, or the server may have been busy with something else during that time. During other test runs, the GTRR (text) method showed less variation.
Creating large numbers of find requests, as is the case with the List of Find and Snapshot Find (worst case) methods, seems to have an exponential growth characteristic. The GTRR method seems to follow a more linear progression. Here is a chart which shows this. (You won’t find this data in the demo file however, since it comes from an earlier version of that file.)
Which Takes Longer: Step 1 or Step 2?
The bottom line is the overall time it takes to recreate the found set. But it’s interesting to split the time into the two steps – collecting the IDs and recreating the found set – and see how long each one takes.
For the two methods that rely on find requests, the second step takes the most time. For the GTRR method, it seems to be more evenly split.
Demo File
If you’d like to run the test yourself, download the demo file. Create some test records, set the found set so that it contains a certain number of records, and then use the Ad Hoc buttons to try any of the methods.
Alternatively, you can click the ‘Run Test’ button, which will loop through all five scenarios, increasing the found set in each iteration. The script will break out of the loop once we’ve used up all of the records in the table, or once a single iteration takes longer than a certain time (which you specify) to recreate the found set, or once it completes a certain number of iterations (which you also specify). If you do decide to run this test, you may want to create a sizable number of test records first. For example, you could bring the total number of records up to 1,000,000. This will give you a meaningful size to test with.
Caution: if you do run the test script, it could take a while to complete. For example, running the test with found sets set to go from 10,000 to 100,000 (in 10,000 increments) and using 20 seconds as the duration limit took about 10 minutes.
Here’s a screenshot of the window where the test parameters are configured:
Script parameter length limit for PSoS (error 513)
If you end up looking at the scripts more closely, you will notice that I am passing the IDs along to the server by storing them in a record. Initially I was passing them along as a script parameter, but since I was testing with large found sets, it wasn’t long before I came across error 513, which is what Get ( LastError ) returns when the script parameter for Perform Script on Server exceeds 1 million characters. Note that the error description for 513 is “Record must have a value in some field to be created”; i.e. it’s not at all obvious what the error is for, and it took a bit of googling to find out.
Summary of Methods
Method
Save Found Set
Restore Found Set
Considerations
Preparation
Speed
List of Find
List of summary
Perform Find
—
Create List of summary field
Slow
GTRR
List of summary
GTRR
Number vs. text
Create List of summary field and set up relationship for GTrr
Fast
Snapshot Find
Shapshot link
Perform Find
Best case vs. worst case
Create Get(RecordID)
It depends (fastest for best case, slow for worst case)
Tangential Findings
I came across some unanticipated lessons while setting up and running these tests. However, these lessons only come into play when dealing with very large record counts.
Number vs UUID
Is it better to set up your primary key fields using auto-enter serial numbers or using text UUID values?
As part of my testing, I created 10,000,000 records in the main table. The table is otherwise quite narrow: there are no extraneous fields; just the ID fields. And yet the file size swelled from a few MB to 2.35 GB. In contrast, the size of a text file containing 10 million UUIDs is 370 MB. So clearly there is quite a lot of space used up that is in addition to the actual field contents.
Aside from the file size implication, as was pointed out earlier, performing a GTRR step using a text ID runs more slowly as compared to number ID fields. Here are some results from one of the tests. Note, the times shown include both evaluating the List of summary field and executing the GTRR step.
Found Count
GTRR (Number)
GTRR (Text)
200,000
27 seconds
40 seconds
400,000
37 seconds
85 seconds
600,000
48 seconds
127 seconds
800,000
61 seconds
171 seconds
1,000,000
127 seconds
218 seconds
If your solution has (or has the potential for having) clients syncing an offline copy of a file with a hosted file, then you have to use UUIDs. But if you are dealing with large record counts, and you do not have to worry about syncing records, then you may be better off using number IDs.
Is There a Limit to a Multi-Line Key or to the List of Summary Field?
In trying out the ‘GTRR’ method, I was curious if there would be a limit to the number of values that a multi-line key can have. I haven’t been able to find a firm answer to this question online. So I created 10,000,000 records and tried it out with the file opened locally (i.e. not hosted).
The short answer is that there is a limit for the GTRR step, and in fact, there is also a limit to how many values the ‘List of’ summary field can handle. However, the exact nature of this limit isn’t clear. Neither will give an error; they will simply function incorrectly.
When I tried recreating a 10,000,000 record found set using the GTRR (number) method, it worked, albeit slowly. The GTRR step took 4.6 minutes when run locally.
With the text ID field, the recreated found set was 7,805,664 records, even though the starting point was 10,000,000 records. It took 31 minutes, and no errors were indicated, even though things didn’t work as expected.
I ran this test a second time, and the resultant found set was once again 7,805,664 records, this time in 30.5 minutes.
When I ran it a third time, I got 7,807,436 records in 31.5 minutes. Of these 31 minutes, it took 23 minutes to compute the list of ID values, and the GTRR step itself took 8.5 minutes. Not all of the values made it into the list either; the list had 7,807,444 values. (I didn’t check for this the first two times.) This means that neither the ‘List of’ summary field nor the GTRR step functioned correctly.
Running the test a fourth time resulted in 7,805,664 records in 30.5 minutes. The list of IDs contained 7,805,672 values. Gathering the IDs took 22 minutes, and the GTRR step took 8.5 minutes.
Test
Original Found Count
Count of Values in ‘List of’ Summary Field
Resultant Found Set After GTRR
Delta
‘List of’ Time
GTRR Time
Total Time (mins)
1
10,000,000
—
7,805,664
—
—
—
31
2
10,000,000
—
7,805,664
—
—
—
30.5
3
10,000,000
7,807,444
7,807,436
8
23
8.5
31.5
4
10,000,000
7,805,672
7,805,664
8
22
8.5
30.5
— means “not measured”
Evaluating a ‘List of’ or running a GTRR step when dealing with millions of records in the found set is very much an outside case; it is not typical. And if you are dealing with this situation, it may be that you have outgrown FileMaker. So the uncertainty of this not functioning correctly is likely irrelevant to your situation. However, I am including my findings here, because it still is a conceivable situation, and it might turn out to be useful information for some of you.
Optimizing The Script
An obvious optimization would involve the case when the found set is showing no records or when it’s showing all records.
If we are dealing with large record counts, there is another simple optimization that we can do. If the found set that we will recreate is more than half the size of the table (the total number of records in the table), then recreating the flip side of the found set will be faster, and once we’re done, we can flip the found set again to show omitted records. (The script would also need to exclude any new records that may have been created since the script started executing.)
The demo file does not do any of these optimizations, but they would be simple enough to implement.
Lastly, if you are dealing with a workflow that requires PSoS processing and where the found sets are typically large, it may make more sense to pass along the search criteria to the script and recreate the find there instead of passing along the IDs. It may not be possible or feasible to do this, but if it is, that may be the better approach.
Test Environment
The testing was done using FileMaker Pro Advanced 13.0v5 on a MacBook Pro (10.10.2) and FileMaker Pro Advanced 14.0.4 on an iMac (10.11.3). I ran the tests first with the file opened locally and again with the file hosted on FileMaker Server 14.0.4.413 and accessing it over the WAN.
The post Recreating a FileMaker Found Set appeared first on Soliant Consulting.