The Era of AI and High-Speed Computing: Potential Risks of White Box Servers
As 32GT/s PCIe 5.0 products have emerged, AI and high-speed computing have gradually shown their effectiveness. Generative AI has particularly received faster integration into people’s daily lives due to the growing demand for high-speed services, it plays an essential role in various industries. This drives the continuous growth of the server manufacturing industry and also shortens the product life cycle of modern servers.
Consumers of the white box server market usually consider the cost before anything else when purchasing a customized server, meaning they will buy each part of the server from various sources according to their needs. The chassis, motherboard, and storage device are crucial factors that contribute to the stability of the server system. However, as PCIe 5.0 has been introduced to motherboards, these customized servers will face some potential risks.
Five Potential Risks of White Box Servers
1. Uneven Heat Dissipation
CPUs that support PCIe 5.0 will have a 350W or higher thermal design power (TDP), requiring high heat dissipation. Different motherboards have different CPU designs, some of which may not correspond to the chassis fans, causing uneven heat dissipation of the CPU.
2. Inefficient Fan
When the TDP of the CPU is greater than 350KW, it usually needs to be equipped with a high-performance fan. When the fan specifications on the chassis are lacking the system temperature will rise due to insufficient heat dissipation. This can affect operation speeds and access speeds, sometimes even causing a system thermal shutdown.
3. Cable Routing
Since the chassis and the motherboard are purchased separately, the positions of the connectors on the motherboards may not be consistent with the design of the chassis. This can cause interference with airflow and reduce cooling performance. This type of potential risk is not easily noticed.
4. Wire Quality
With PCIe 5.0, the requirements of this type of cable are very high. Usually, chassis manufacturers are not familiar with newer high-frequency technology, so cables and wires may cause performance issues.
5. Backplane Design
Due to chassis manufacturers not being familiar with newer high-frequency technology, they also face challenges when designing the backplane of the storage device such as impedance mismatch, high insertion loss, high return loss, cross talk, and more. The accumulation of these issues can increase the risk factor.
The potential risks mentioned above not only lead to performance reduction, shortened CPU life, and system instability, but in more severe cases, the system will get stuck in a restart loop, shut down, or cause a thermal crash. The various services that the server provides may become unstable or crash, sometimes even causing data loss. This will lead to a negative customer experience and numerous customer complaints, eventually affecting your brand’s reputation.
Allion’s Reliability Simulation Solution
To counteract these potential risks, Allion can provide a reliability simulation solution to verify and ensure the quality of your servers. These simulations will be designed and evaluated according to the following parameters.
First, the upper and lower limits of the server’s operating temperature will be measured. To confirm the server’s operation status during high and low temperatures, different temperature cycles will also be measured depending on different application scenarios.
In addition, the services that the server provides will also be put into consideration when designing the server. For example, for high-speed computing servers, CPU and DDR loads will be increased; and for data storage servers, storage loads will be increased. Each workload verification cycle can confirm the designs for each part. Simultaneously, each of the results will be recorded in a detailed report, allowing our clients to observe abnormal changes in server performance.
Faster, Easier, Better! The Best Server Consultant For You
As a high-tech application consulting company, Allion has a complete collection of testing equipment, simulation environments, and plenty of experience. We can provide faster, easier, and better high-quality services such as:
1. We have a complete range of various temperature chambers that has temperatures ranging from -100°C to 200°C, a large internal space that fits three 52U cabinets, and a 65KW maximum heat load.
2. We have rich client working experience and can plan and execute solutions efficiently.
1. The reliability simulation solution takes 3-5 days to verify the potential risks without spending extra money and time.
2. If there is an issue during verification, we can provide problem isolation, debug support, and solution suggestions to efficiently take care of the problem.
1. We also provide a server life cycle assessment that estimates the product life of your servers and helps make your manufacturing plan more complete.
2. We can help you with quality assurance of key components in your products and prevent potential risks in advance, minimizing risk.
If you have any further needs for testing, verification, or consulting services related to the server ecosystem, please feel free to explore the following services online or contact us through the online form.
- Allion User Reliability Test Lab https://www.allion.com/test-lab/user_reliability/
- Allion Server Validation Service https://www.allion.com/server-validation/