Content of this article: It mainly describes the troubleshooting of various faults of Antminer D7 and how to use the test fixture for accurate positioning. ZEUS MINING organizes content, typesets, and shares.
Ⅰ. Preparation requirements for repair platform/tools/equipment
1. Platform requirements:
Anti-static maintenance workbench (the workbench needs to be grounded), anti-static wristband and grounding.
2. Equipment requirements:
(1) Constant temperature soldering iron (350°C-380°C), pointed soldering iron tip is used to solder small patches such as chip resistors and capacitors;
(2) Hot air gun and BGA rework station are used for chip / BGA disassembly and welding;
(3) Multimeter with welded steel pins and heat-shrinkable sleeves for easy measurement (Fluke 15b+ multimeter is recommended);
(4) Oscilloscope (recommended oscilloscope UTD2102CEX+), network cable (requirements: Internet connection, stable network)
3. Test tool requirements:
(1) APW12 power supply: AP12_12V-15V_V1.2 and power adapter cable (made by yourself: use thick copper wire for the positive and negative poles of the power supply to connect the power supply and the power board, it is recommended to use 4AWG copper wire with a length of less than 60cm, only limited to PT1 and maintenance test use), used for hash board power supply;
(2) Use the test fixture of the V2.3 control board (test fixture material number ZJ0001000001). The positive and negative poles of the test jig need to be installed with discharge resistors. It is recommended to use a cement resistance of 20 ohms and 100W or more.
4. Maintenance auxiliary materials / tools requirements:
(1) Solder Paste 138°C, flux, Mechanic lead-free circuit board cleaner and anhydrous alcohol;
(2) Mechanic lead free circuit board cleaner is used to clean up the flux residue after maintenance;
(3) Thermally conductive gel is used to apply on the chip surface after repair;
(5) When replacing a new chip, it is necessary to tin the chip pins and then solder them to the hash board. Apply thermally conductive gel evenly on the surface of the chip, and then lock the heat sink.
(7) Serial port adapter board RS232 to TTL adapter board 3.3V.
(8) Self-made short-circuit probe (use the pins for wiring and welding, and need to heat the shrinkable sleeve to prevent short-circuit between the probe and the small heat sink).
5. Common maintenance spare material requirements:
0402 resistor (0R, 51R, 10K, 4.7K,);
0402 capacitor (0.1uF, 1uF)
Ⅱ. Maintenance requirements
1. Pay attention to the operation method when replacing the chip. After replacing any accessories, the PCB board has no obvious deformation. Check the replacement parts and the surrounding parts for missing parts, open circuits and short circuits.
2. Maintenance personnel must have certain electronic knowledge, more than one year of maintenance experience, and be proficient in BGA/QFN/LGA packaging and welding technology.
3. After the repair, the hash board must be tested more than twice and all are OK to pass!
4. Check the tools, whether the test fixture can work normally, determine the parameters of the maintenance station test software, the version of the test jig, etc.
5. For the test of repairing and replacing the chip, it is necessary to test the chip first, and then do the functional test after Pass. The functional test must ensure that the small heat sink is welded OK and the large heat sink is installed in place (each thermal adhesive gel must be applied evenly and then the large heat sink is installed), and the cooling fan is at full speed. When using the chassis to dissipate heat, two hash boards should be placed at the same time to form an air duct. The single-sided test of the production should also ensure that the air duct is formed (Important)
6. When measuring the signal, assist 4 fans to dissipate heat, and the fans keep full speed.
7. When the hash board is powered on, the negative copper cord of the power supply must be connected first, then the positive copper cord of the power supply must be connected, and finally the signal cable must be inserted. When disassembling, the order of installation must be reversed. First, remove the signal cable, then remove the positive copper cord of the power supply, and finally remove the negative copper cord of the power supply. If you do not follow this order, it is very easy to cause damage to U1 and U2 (not all chips can be found). Before testing the Pattern, the repaired hash board must be cooled down before testing, otherwise it will lead to testing PNG.
8. To replace a new chip, brush the pins and solder paste to ensure that the chip is pre-tinned and then soldered to the PCBA for repari.
9. The test fixtures on the maintenance side are all tested in Test_Mode mode and in the scan code mode. After the test pass, the production side will be streamlined from the first test station, and the normal installation will be aged (installed according to the same level).
Ⅲ. Test fixture production and precautions
The test fixture should satisfy the heat dissipation of the operation board to facilitate the measurement of signals.
1. Get the part number: ZJ0001000001 test fixture.
2. For the first time, use the 19 series test fixture SD card to swipe the program to update the FPGA of the test fixture control board. After decompressing, copy it to the SD card, and insert the card into the test fixture card slot; wait for about 1 minute after powering on and wait for the control board indicator light to double flash 3 times, the update is completed; (if it is not updated, it may cause a certain chip to be reported as bad during the test).
3. Make the test SD card according to the requirements, the single-sided heat sink detects the chip, and directly decompress the compressed package to make the SD card; PT1 does not need to scan the code; the software version is . Copy the following files to the SD card, the production is complete.
4. Make the test SD card according to the requirements. The double-sided heat sink 8x Pattern test needs to make the SD card, as shown in the figure below; the PT2 test needs to be inserted with a code gun and a network cable. The software version is , copy the following content into the SD card to complete the production;
5. When using the double-sided 8x Pattern test on the production side, the back-end sales side, and the outsourcing maintenance side, a matching code scanner and serial port tools are required. For details, please refer to the D7 test guide document.
Ⅳ. Principle Overview
1. Working structure of D7 hash board:
The D7 hash board consists of 70 Antminer BM1764 chips, which are divided into 35 domains, and each domain consists of 2 ASIC chips; the working voltage of the BM1764 chips used in the D7 hash board is 0.3V; The 20V voltage output by the boost circuit U238 is output by the linear regulator (U295 U16 U14 U307 U310 U313) to provide 1.8V to the 29, 30, 31, 32, 33, 34 groups (6 groups in total) to provide LDO 1.8V power supply, the 1.8V to the LDO (U294 U15 U13 U306 U309 U312 U187 U188 U190 U191 U193 U194) output 0.8V. The 28th group – the 1st group and the 35th group are provided by VDD 13V through LDO to provide 1.8V, 1.8V and then through the linear regulator to provide 0.8V, the domain voltage is about 0.3V. As shown in figure:
Group 28 – Group 1 and Group 35
2. BXD34601 hash board boost circuit:
The boost is powered by 13V from the power supply and turns to 20V, as shown in the figure.
3. D7 chip signal direction:
(1) The flow of CLK (XIN) signal is generated by Y2 25MHZ crystal oscillator and transmitted from chip No. 01 to chip No. 70; the voltage is about 0.9V;
(2) The flow of RST and CI signals enters from the 3rd pin (3.3V) of J3, and is converted by the level conversion IC U1-U3-U4, and then transmitted from chip No. 01 to chip No. 70;
(3) The signal flow direction of RX (RI, RO) is from chip No. 70 to chip No. 01, and returns to pin 8 of the signal cable terminal through U2, and then returns to the control board;
(4) BO (BI, BO) signal flow, from chip No. 01 to No. 70;
4. The whole miner’s structure:
The whole miner is mainly composed of 3 hash boards, 1 Antminer D7 control board, APW12 power supply, and 4 cooling fans, as shown in figure:
Ⅴ. Common problems and troubleshooting steps of hash board
Phenomenon 1: The single-board test detects that the chip is 0 (PT1/PT2 station)
The first step: check the power output first, please check the voltage of the circled part in the figure below.
The second step: check the voltage output of the voltage domain
The voltage of each voltage domain is about 0.3V, and the 13V power supply generally has the domain voltage. Priority is given to measuring the output of the power supply terminal of the hash board, and whether the MOS is short-circuited (measure the resistor value between pins 1, 4, and 8). If 13V is powered but there is no domain voltage, continue to check.
The third step: check the PIC circuit
Measure whether the pin 11 of U6 has output (3.3V). If yes, please continue to troubleshoot the problem. If not, please check that the connection between the test fixture cable and the hash board is OK, and reprogram the PIC.
PIC programming steps:
(1) Burn the PIC program of the hash board.
Download the programming tool: PICkit3.5 programmer, the pin 1 of the PICkit3 cable corresponds to the pin 1 of J3 on the PCB board, and needs to be connected to pins 1, 2, 3, 4, 5, and 6.
(2) Burning software:
① Open MPLAB IPE, select device: PIC16F1704;
② Click Power to select the power supply method, and then click Operate;
③ Select file to find the .HEX file to be burned;
④ Click connect, the connection is normal
⑤ Click the Program button
⑥ After the completion, click Verifiy, prompting the verification completion to prove that the burning is successful.
The fourth step: check the output of the booster circuit, and test C69 in following figure, and the voltage can be measured to 20V.
The fifth step: check each group of LDO 1.8V or PLL 0.8V output.
Group 28 – Group 1 and Group 35
The sixth step: check the chip signal output (CLK/CI/RO/BI/RST)
The voltage value range described by the reference signal direction, if the measurement encounters a large deviation of the voltage value, it can be compared with the measurement value of the adjacent group.
When EEPROM NG is displayed on the LCD screen of the test fixture, check whether U10 is soldered normally;
When the PIC sensor NG is displayed on the LCD screen of the test fixture, the test reading temperature is abnormal. Follow the steps below to troubleshoot:
a) Check whether the four resistors R214, R215, R1071, R1076 are abnormally welded;
b) Check whether the welding of U5, U7, U8, U9 temperature sensor chips is normal, the temperature sensor position is as shown in the figure, and also need to check whether the 3.3V power supply of the temperature sensor chip is normal; check the welding quality of the chip connected to the temperature sensor and the small heat sink, the deformation of the large heat sink material will cause poor heat dissipation of the chip and affect the temperature difference.
Phenomenon 2: The single-board detection lack of chips (PT1/PT2 station)
a) When the LCD screen of the test fixture displays ASIC NG: (0), first measure the total voltage of the measurement domain and the boost circuit 20V is normal, and then use a short-circuit probe to short-circuit the RX test point and the 1V8 test point between the 1st and 2nd, and then run the program. Looking at the serial port log, if 0 chips are still found at this time, it will be one of the following situations:
a-1) Use a multimeter to measure whether the voltages of the 1V8 and 0V8 test points are 1.8V and 0.8V. If not, it may be that the 1.8V, 0.8V LDO circuit in this domain is abnormal, or the two ASIC chips in this domain are not soldered well, most of which are caused by the short circuit of the 0.8V, 1.8V SMD filter capacitors (measurement calculation The resistance value of the chip filter capacitor related to the force plate).
a-2) Check whether the circuits of U1, U2, U5 are abnormal, such as resistor in poor soldering, etc.
a-3) Check whether the first chip has pins that are not soldered well (it was found during maintenance, the pins were tinned from the side, but when the chip was removed, it was found that the pins were not tinned at all).
b) If one chip can be found in step a), it means that the first chip and the previous circuit are all good, and the following chips are checked in a similar way. For example, short-circuit the 1V8 test point and the RO test point between the 38th and 39th chips. If the log can find 38 chips, the first 38 chips are no problem; if 0 chips are still found, check whether the 1V8 first is normal. If it is normal, there is a problem with the chips after 38. Continue to dichotomize until you find the chip in question. Suppose there is a problem with the Nth chip, then when the 1V8 and RO between the N-1th and Nth chips are short-circuited, the N-1 chip can be found, but the 1V8 and RO between the Nth and N+1th chips can be found. When a short circuit occurs, the entire chip cannot be found.
c) When the LCD displays ASIC 69: (reports 69), it means that the hash board can detect 69 chips at a frequency of 500M; after finding the corresponding chip position by the short-circuit method, first check whether the test results are the same, if they are the same, it means the chip Bad, need to replace the chip;
d) Maintenance method: Use the dichotomy method to short-circuit the 1V8 test point and the RX test point between the 38th and 39th chips by short-circuiting the probe. If the log can find 38 chips, then the first 38 chips There is no problem with the chips; if 47 chips are short-circuited, and the log reports 46 chips, it means that the 47th chip cannot be detected, and there is no problem with the appearance inspection. Generally, the 47th chip can be replaced (you can also swap 46 and 47, if after swapping, If 46 chips are shorted and 45 is reported, it can be determined to be a chip problem, and the 46th chip should be replaced);
e) When the liquid crystal displays ASIC NG: (fixed to report a certain chip), it can be divided into the following two situations:
The first case: usually the value of the chip will not change each time the test is reported. In this case, the maintenance method can be carried out according to the maintenance method of measuring the signal voltage normally.
Phenomenon 3: Incomplete reply nonce data (PT2 station)
PS: Special attention needs to be paid, the number of asic starts from 0, 000-069
Phenomenon 4: The test chip test is OK, and the PT2 function displays PLUG OFF BEFORE TEST OVER.
This is because the software sets a delay during the test, and the tester unplugs the cable before the voltage drops to 0. Need to restart the fixture to test.
Requirements for the PT2 test environment: The temperature of the PT2 test environment should be between 20°C and 30°C. When the ambient temperature exceeds 35°C, the software will stop the test.
PT2 test power supply requirements: When the PT2 test fixture power supply is under a load of 1500 watts (in the case of testing a board), the actual output voltage cannot be lower than the 0.03V set in the configuration file. (For example, if the configuration file requires trial production of 15V, then the output voltage of the power supply cannot be lower than 14.97V when it is loaded with 1500 watts)
Ⅵ. Control board problems cause the following problems
1. The whole miner does not work
7007 Control Board
(1) Check whether the voltage of several voltage output points is normal, 3.3V short circuit can disconnect U8 first, if it is still short circuit, you can unplug the CPU to measure. For other abnormal voltages, generally replace the corresponding converter IC.
(2) If the voltage is normal, please check the welding condition of the DDR/CPU (X-RAY inspection on the production side)
(3) Try to update the flash program with SD card;
If the miner with the control board card swiping needs to start normally, the following two steps are required:
a) After the card is successfully swiped, the green LED indicator is always on, and the power is turned off and restarted;
b) Wait for 30s after powering on again (time course of opening OTP)
c) OTP (One Time Programable) is a memory type of MCU, which means one-time programmable: after the program is burned into the IC, it cannot be changed or cleared again;
(1) Sudden power failure or the time has not reached 30s during the OTP opening process will cause the control board to fail to open the OTP function, the control board will not start (not connected to the Internet), and the U1 (control board main control IC FBGA) needs to be replaced. U1 can no longer be used on D7 series;
(2) U1 cannot be used on other series of models if the control board with OTP function is turned on;
Amlogic control board
If you can’t enter the web page, it may be that the control board is not burned, first use the image file and encrypted file to burn it, and then upgrade the IM or OM for testing after burning.
2. The whole miner cannot find the IP
There is a high probability that the IP cannot be found due to the abnormal operation. Refer to point 1 for troubleshooting.
Check the appearance and welding condition of the network port, network transformer T1, and CPU.
3. The whole miner cannot be upgraded
Check the appearance and welding condition of the network port, network transformer T1, and CPU.
4. The whole miner fails to read the operation board or the chain is missing
A. Check the cable connection.
B. Check the parts of the control board corresponding to the chain.
C. Check the wave soldering quality of the header pins and the resistance around the plug-in interface.
Ⅶ. the whole miner failure phenomenon
1. Preliminary test of the whole miner
Referring to the test process file, the general problems are assembly process problems and control board process problems.
Common phenomenon: IP cannot be detected, abnormal fan speed is detected, and abnormal chain is detected. If there is an abnormality during the test, the repair should be carried out according to the monitoring interface and test LOG prompts. The repari methods of the initial test and the aging test of the whole miner are the same.
2. Aging test: During the aging test, maintenance should be carried out according to the monitored interface test, such as;
1) The fan display is abnormal: we need to check whether the fan works normally, whether the connection with the control board is normal, and whether the control board is abnormal.
2) Less chain: Less chain means that 3 boards are missing 1 board. In most cases, there is a problem with the connection between the hash board and the control board. Check whether there is an open circuit in the cable. If the connection is OK, you can test the board for PT2 to see if it can be tested. If it can be tested, it can basically be determined that it is the control board; if the test fails, use the repair method of PT2.
3) Abnormal temperature: Generally, the temperature is high. The maximum PCB temperature set by our monitoring system cannot exceed 90 °C. If it exceeds 90°C, the miner will alarm and cannot work normally. Generally, the ambient temperature is too high, and the abnormal operation of the fan will also cause abnormal temperature.
4) Can not find all chips (the same can be turned on, but the hashrate is 2/3 or 1/3 of the normal value), the number of chips is not enough: if the number of chips is not enough, you can refer to PT2 for testing and repair.
5) After running for a period of time, there is no hashrate, and the connection of the mining pool is interrupted, check the network;
6) The aging test state of the normal good miner:
7) The single board test is OK, but one board has low hashrate after assembling the whole miner for testing: the first board has no hashrate after running for about 3 minutes.
The specific analysis method is as follows:
① Test the board PT2 first to see if the board is OK. If the test chips are not all checked, repair the board. If the normal gear is OK after the test, take out the board separately and use the test fixture to fix it. The tool is transported into a Debug master chip program for mining, and the fan speed is debugged to 100%. The voltage and frequency are adjusted to the working voltage and frequency of the whole miner. Let the miner mine and see if the miner loses hashrate.
② If the miner still loses hashrate, then reduce the frequency to 400M and other conditions remain unchanged. Let the miner mine to see if it will lose hashrate and whether the hash board will hit X, if it still hits X to lose hashrate, then you can remove the heat sink of the hash board for mining, and wait for the hashrate to drop, measure whether the domain voltage is normal. Generally, the domain voltage will be abnormal in the problematic domain. Then measure the RI signal to see if the RI signal is broken. If the RI signal is missing, basically the chip is short-circuited or damaged after being tinned.
Ⅷ. Other matters needing attention
Maintenance flow chart
1. Routine inspection: First, make a visual inspection of the hash board to be repaired to see if there is any PCB deformation or scorching. If there is any, it must be dealt with first; whether the parts have obvious burn marks, parts impact offset or missing parts, etc.; secondly , After the visual inspection is no problem, you can first test the resistor value of each voltage domain to detect whether there is a short circuit or an open circuit. If found, it must be dealt with first. Again, check whether the voltage of each domain is about 0.32V.
2. After the routine inspection is OK (generally the short-circuit inspection of the routine inspection is necessary to avoid burning the chip or other materials due to short circuit when the power is turned on), the chip inspection can be performed with a test fixture, and the positioning is determined according to the test results of the test fixture.
3. According to the display results of the test fixture, start from the vicinity of the faulty chip, and check the chip test points (CO/NRST/RX/XIN/BI) and voltages such as VDD0V8 and VDD1V8.
4. According to the signal flow direction, the RX signal is transmitted in the reverse direction (chips 70 to 1), among which several signals CLK CO BO RST are transmitted in the forward direction (1-70), and the abnormal fault point is found through the power supply sequence.
5. When locating the faulty chip, the chip needs to be re-soldered. The method is to add flux (preferably no-clean flux) around the chip, heat the solder joints of the chip pins to a dissolved state, and promote the chip pins and pads to re-run in and close the tin, so as to achieve re-tinning effectly. If the fault is still the same after re-soldering, the chip can be replaced directly.
6. After repairing the hash board, when the test fixture is tested, it must pass more than two times to be judged as a good product. For the first time, after replacing the parts, wait for the hash board to cool down, use the test fixture to test the pass, and put it aside before cooling. The second time, wait a few minutes for the hash board to cool down completely before testing.
7. After the hash board is repaired OK. Relevant maintenance / analysis records need to be made (requirements for maintenance reports: date, SN, PCB version, tag number, cause of failure, attribution of responsibility for failure, etc.). In preparation for feedback to production, after-sales, and research and development.
8. After recording, install it into a complete miner for regular aging.
9. The good products repaired at the production end should be streamlined from the first station of production (at least the appearance inspection and the PT1/PT2 test station).
10. For the repaired hash board, the large heat sink must be removed, and the thermal conductive gel must be brushed again before it can be streamlined!