Thanks for my colleague Joe who spent the effort documented this and here we want to share our experience with you.    
Summary 
We have successfully migrated a huge volume data into Dynamics CRM 2016 On-Prem. We end up created nearly 400 million records in the Dynamics CRM 2016 database. This article summarizes the migration progress, and share some performances scenarios that will affect the overall migration plan.  
To avoid too specific, some of the figure in this article is only a reference about the order of data volume, like around 50 million of Email records. 
Production Server Setup 
The production server farm for the Dynamics CRM 2016 is composed of 4 front end servers, 2 backend servers, 2 database servers in a cluster. The database server has 32 CPU and 256G RAM. One thing to high-light is that we are using SAN with traditional HDD instead of SSD hard disk for SQL data storage. The data file is split into 5 volumes to maximize the IO speed. We have seen IO throughput up to 500MB/s in database HDD, however we normally cannot achieve that high throughput when loading data into CRM. The high-volume IO throughput can only be achieved during database server operations, like re-index, or backup. And it turns out that the HDD performance is the main factor that limit our data loading speed in this case. 
Data was loaded into the CRM using several data migration servers in parallel. The data is loaded through the customer .Net command line program. The data is already pre-formatted in a SQL server. Reading data from the source SQL server will not be a performance bottleneck because it has been tuned specifically for this data migration. And it has been proven hundred times quick than loading data into the CRM server.  
We have tried to run the command line data migration program in the frontend server, because we though we can save the network time when sending HTTP request to frontend servers through CRM SDK API. We found the CPU consumption can be boosted to 100% if we run the data migration program in the frontend server. We end up using 4 separate data migration servers to run the data migration program. There are 8 CPU and 32G RAM in these data migration servers. However, the CPU of RAM is not the key factor to the data migration performance because the actual bottleneck is on the CRM database server. 
The Data 
There are 50+ entities in CRM that need data migration. The fundamental entity is the built-in Contact entity, which is used to store the Customer. In this case every customer is an individual, there is no company concept in current solution. Other entities are all related to the customer directly or indirectly.  
Directly related to the customer are a number of entities for structure data such as Phone Numbers, Addresses and Email Addresses. They are all custom instead of build-in CRM entities. These are separated entities instead of a fields in Contact entities because they have relative complex structure, and system allow 1:N relationships.  
There are also a number of Activity type entities like Phone Calls, Emails, Mails, SMS etc. These are built-in or custom CRM Activity type Entities. This group of data has biggest impact on data loading speed. Activities are loaded much slow than other non-activity type entities. And in our case we have a very large volume of Activity type records.  
Activity type entities has the largest volume of data. We have around 70 million records in activities, which will create nearly 350 million records in CRM database because each such business record will create 5 records in backend database, one in the entity itself, one in ActivityPointer, and three in ActivityParty. We’ll come back later in this article about some issues we had with the Activities.  
Server CPU Usage 
We have loading data into CRM continuously in 3 weeks. The CPU usage is not high in our case. As a reference, we normally have: 
- Less than 30% CPU in CRM frontend servers;
- Less than 10% CPU in CRM backend servers;
- Less than 30% CPU in data migration servers;
- Less than 30% CPU in database servers;
- Occasionally 100% CPU in the source database server.
So the CPU is not affecting the data loading performance. We have seen occasionally 100% CPU usage in source SQL database server. This source server database is actually a staging database. We have done almost all data transform before loading data into this staging database. However, there are still some join works, particularly those not join by the key, but rather join by fields and need to run some functions like Left or Middle. These operations can cause 100% usage in SQL server when the data volume is large. Luckily we can normally finish such operations in source database server in minutes, so it will not affect the overall data loading performance. 
The sorting or conditioning may sometimes use a lot of CPU as well, we suggest adding Indexes carefully to avoid long operations in SQL for sorting and conditioning. Pre-defined index can help speed up source database, and avoid timeout errors in the data migration program.  
Hard Disk Usage 
We will focus on the HDD usage of the CRM database server because HDD usage in all other servers in CRM frontend servers, CRM backend servers and data migration servers are pretty low, and will not impact the data migration performance. One thing confused us a lot when we first found performance issues is the HDD throughput. We found the HDD throughput is not big in database server either. Normally it only has less than 10MB/s throughput, sometime it can reach up to 30MB/s, but it is far less than 500 MB/s that we have seen in some SQL operations. We were not convinced that SQL server was the bottleneck by looking at the throughput alone. However, we noticed a measure of “% Highest Active Time”, the blue line of Disk part in following screen capture, in Windows Resource Monitor is constantly high, and sometimes reach 100% constantly for nearly 5 minutes. We finally find this is an indicator when the server is limited by the HDD performance. 
The Data Migration Program 
We have developed a command line data migration program using .Net and CRM SDK. The program read a configuration in JSON file to load data from source into CRM. We used late-binding, and using SQL statement in the JSON configuration file to read data from source database. This way we don’t assume any hardcoded data structure in either source database, or the CRM entity. To maximize the program performance, we have special handling of following data fields: 
- Option Set fields
- Lookup fields
- State field
- DateTime field
The data migration program is a command line .Net program, and we can make use of multi-thread features to generate multiple request to the server for boosting the data load performance. We’ll talk more about behavior of multi-thread in a moment.  
The data migration program is also designed to load data from multiple servers. Some data is referring to data in other entities, so these entities actually have dependencies that will affect the data loading sequences. The program was designed to coordinate between servers so it will wait if a block has dependence on other blocks.  
For any fields in the source data table, we are loading the data into the same name field except the lookup fields or owner fields. If a field in source does not exist in the target entity, it will be skipped. If a field in the target entity does not exist in source data, the default value will be used for newly created record, or keep original value if it is updating the record. No deleting logic was implemented in the program. 
CRM Requests 
The data migration program has used following CRM requests to load data: 
- CreateRequest for creating new records;
- UpdateRequest for updating exist records;
- SetStateRequest for changing record state; and
- CloseIncidentRequest for closing a case record.
ExecuteMultipleRequest was used to execute multiple request at a time for improving the overall performance. 
Log and Error Handling 
Comprehensive logging features was developed for logging. Different level of information will be logged according to the configuration setting. If “Trace” level was set, everything will be logged include each Request sent to CRM and Response from CRM. This detail log is very helpful in debugging within large volume data loading. We use the “Trace” level in Mock Conversion Test, but we used “Error” level in actual data loading to minimize the data dump. 
We are loading a large volume of data, the program will be run non-stop for weeks. The program will not stop even if there are any record failed. The error record will be logged, and we can study and handle it afterward. The program can be paused at any time, so we can reload error records after fixing them if there are other records depends on them. 
Because we need weeks to load data, so we have used an initial data load that will run weeks. And then we load data incrementally. This way we can load last batch of data into CRM in 4 hours during the cutover night. However, the detail of handling complex situations in Initial-Delta load will not be covered in this article. 
OptionSet, Lookup and DateTime Field 
OptionSet values will be pre-loaded into the data migration program when it starts up. This design can improve the overall performance because we don’t need to query the option set every time loading data. However, because the program will run weeks, so OptionSet in CRM should not change during this data loading period. 
Every record in CRM has a GUID key. This key is pre-generated in the source database. Lookup field can be filled directly from data source, instead of query CRM every time loading a record. Internally there is a conversion in such field name, and program will generate EntityReference accordingly. Detail of handling logic will not be discussed in this article. 
All DateTime field in CRM is saved in UTC. However, if all local information is set correctly in the server, DataTime filed should be populated with local time when passing it to the Requests.  
Status Field 
For each entity in CRM, there is a State field and a Status field. The values in Status field is linked to a particular State value. We need to use the SetStateRequest to set the value if it is not the default State. And for the Incident entity, we need to use the CloseIncidentRequest to set value for certain State and Status value. These two request need to be called for existing record. We cannot use one single call to create record and update state/status field together. For the data loading efficient, we use two step approach. First we create all records for that entity, second we update status for all records in that entity. This approach is only used for those entity we know most of the records are in Inactive status.  
Circularly Referenced Field 
We have case when one entity use lookup field to refer other entity, and in that other entity has also a lookup field to refer back to first entity. For such scenarios we need to create 1st entity records, then create 2nd entity records will all reference to 1st entity records set, and then update 1st entity records to fill in the lookup field value that is referring the 2nd entity records. So we actually need twice the process, which means twice the time needed for data migration, to fully populate records for certain entities, that is the 1st entity in above example. 
Data Loading Performance 
We studied several cases for CRM data migration performance. Summarized from them, we estimated that we can achieve 600,000 records per hour when loading data into CRM using the CRM SDK. During Mock Conversion Test we tested the speed using real data. We found a number of issues that will affect the data loading speed, and we finally using different ways to make sure we can finish the task on time. In the actual data loading. With 4 servers loading data in parallel, we can load data much quick than planned 600,000 records per hour for many entities, but we have much lower speed for Activity type records. We finally spend 3+ weeks non-stop to load all data into CRM server. 
Data Loading Speed 
Data was loaded entity by entity. There is dependence between records, we have arranged the data loading sequence to first load data without dependence, and then the data that could be depending on records in other entities. We are loading data from 4 servers in parallel. And Contact entity is the base of many other entities, so we first loaded Contact entity in all 4 servers, then we load other entity by carefully arranged sequences.  
The CRM organization is basically an empty organization when we start to load the Contact entity. And we can achieve around 900,000 records per hour in one server, so 4 servers together we can achieve 3,600,000 records per hour. 
When we start to load other entities, the speed changes significantly between different entities. However, almost all entities can achieve 300,000+ records per hour per server until we start to load Activity type entities.  
When we start to load Activity entities, the speed is around 130,000 records per hour. And as we have loaded 10s of millions records in the organization, the speed start to decrease, until finally around 60,000 records per hour per server. And there is also significant difference in speed for entities like Letter, Email and a custom entity for SMS. The difference is around 110,000 records for Letter, and 130,000 for Email initially.  
After we have loaded around 100 million business records in the CRM, which actually created nearly 400 million data records in SQL server, adding new data in Contact entity is also very slow. The speed degraded to around 60,000 records per hour per server.  
One issue we found in Mock Conversion Test is that the speed of updating Contact is extremely slow. In many cases it will timeout. I traced the situation and found the issue was from the Cascade behavior. Particularly the cascade of Name and Owner. When we update the Name field in Contact entity, it will try to update all related entities for the YomiName field. And it need to update many records from the Activity type records too. Without proper index, the searching and updating operation in SQL server can timeout. Similarly, when we update the owner field, the configurable cascade behavior will try to update owner field of all related records, and it can be timeout for those large volume entities. 
Multi-threading and Data Loading Speed 
The data loading program can use multi-thread to send HTTP request to server simultaneously. We have tested the speed increase using multithread feature. Each thread will receive a certain number of data, send request to server and wait. During a test we measured the speed by using 1 thread, 3 threads and 10 threads. Following table show average speed  
| 
Threads  | 
Speed (Records/Hour)  | 
CRM Front End CPU Usage  | 
| 
1  | 
100,000  | 
30%  | 
| 
3  | 
300,000  | 
50%  | 
| 
10  | 
400,000  | 
60%  | 
The data loading speed in CRM is highly depends on a particular entity. Some entity can load data very quick, some will be very slow. The above figure is not exactly the speed for a particular entity, it has only a relative meaning when compare with each other. The speed increase linearly until 3 threads, and then slows down after it. 
The request is sent to CRM by each thread when it has read enough data for a batch, e.g. 1000 records. Because reading data from the source database is very quick, so many requests is sent to server almost at the same time. This behavior is related to one issue we encountered for some entities that is loaded very slow. We found that when the data is loaded very slow for some entities, some records in one batch will all fail, and the error message like this: An unsecured or incorrectly secured fault was received from the other party. My hypothesis is that the Requests was sent to CRM at almost the same time, but CRM will process it batch by batch. For example, if processing 5 batches, each contains 1000 requests, need 5 minutes, then the 6th request will be authentication failure because of the token timeout. Our data migration program will run several weeks continuously. The speed at later stage will be very slow because it was handling slow loading entities, and the volume of these entities becomes larger as we loaded more data in. We finally choose to use 5 threads, with each batch contains 1000 request, so we can have acceptable data loading speed, but can also avoid token security issue even for those very slow loading entities. 
Same Entity Data Loading 
We have 4 data migration servers, that can send data creating/updating requests to 4 CRM frontend servers respectively. We also tested whether there will be a speed impact if we are loading data into same entity from multiple data migration servers. Our test didn’t find any significant speed difference while we load data for different entities from different data migration servers, and if we load data into same entity from 4 different servers. 
Loading of Activity Entities 
We had several issues with loading Activity type entities. We noticed it is loaded much slower than other entities in very early stage of mock conversion test. We received a lot of errors of timeout when we first try to load them with 20 threads. It didn’t help a lot even we change to load data with only 5 threads and after we loaded a certain amount of records. We traced possible reasons, the first problem we found is that it that CRM will create a record in PrincipalObjectAccess (POA) entity for each Activity type record. And POA is regarded as a major source of CRM performance issues. Creating a record in POA for each Activity records come from the cascade behavior of these entities. In our Activity entities, we have a lookup to the Campaign entity. And CRM will create a record in POA to get the shared permission for the Campaign record owner if it is not the same as the Activity owner. We end up created 10s of millions records in POA which was normally consider too much for POA entity.  
After we changed the configuration for the cascade behavior, in next mock conversion test, a lot of timeout appeared again after we loaded nearly 30 million Activity records. When we study the case, we found the issue of HDD high “% Highest Active Time” issue, and later we found we need to fight for this issue in many different situations. The actual speed we can safely achieve without causing timeout is only 10,000 records per hour per server. By additional study we found that there are 50+ Indexes in the ActivityPointer entity, which is common for all Activity type entities. And creating a new record and updating all these 50+ indexes is a source of performance issue. So we disabled most of these Indexes except 5 which will be used when loading data into CRM. After disabling these Indexes the data loading speed is resumed to nearly 70,000 records per server per hour, which can support us load all 70 million Activity records in an acceptable time frame. 
The next issue we found about the Activity records is when we update the Contact record, it will timeout. We traced the backend SQL queries and found that the SQL timeout appears when CRM is trying to update Activity records because of Cascading from Contact entity. Although there are 50+ Index in ActivityPointer entity, but there is no proper index that can support updating several records related to that particular Contact records within 10s of million AcitivityPointer records. After we added a new Index, the problem was resolved. 
Another thing we found in data loading is that the Clustered Index for AcitivityPointer entity and other entities as well will become fragment as we are loading millions of records in the table. The Contact entity is very important because it will be queried many times as it is referenced by many other entities. The AcitivityPointer and AcitivityParty is also very important because we have too many records there. Periodically rebuild the index will help maintain the data loading speed. But careful plan is necessary, because we need nearly 7 hours to rebuild index for the AcitivityPointer entity. Too much rebuild will not help a lot for the overall data loading task. Luckily we have design a PAUSE flag in the data migration program, once set the data migration processes in all data migration servers will be paused, we can finish the rebuild index task in the database server, and then reset the flag to resume the data loading process. By this mechanism we execute 5 rebuild index task during 3 weeks of data loading period, safely completed the data loading execution without causing any data error or loss because of timeout. 
 
