I. Preparation requirements for maintenance platform / tools / equipment
1. Platform requirements:
Static skin maintenance workbench (workbench needs to be grounded), anti-static wrist strap, and grounding.
2. Equipment requirements:
Constant temperature soldering iron (350 degrees -380 degrees), pointed soldering iron tip is used for soldering small patches such as chip resistors and capacitors; portable desoldering gun and BGA rework station are used for chip / BGA disassembly and soldering; Fluke multimeter; FNIRSI oscilloscope.
3. Test tool requirements:
APW12 power supply (AP12_12V-15V_V1.2 and power adapter cable (made by yourself: use thick copper wire for the positive and negative poles of the power supply to connect the power supply and the power board, it is recommended to use 4AWG copper wire with a length of less than 60cm), used for power supply to the hash board use; control board, hash board test fixture, the positive and negative poles of the test fixture need to be installed with discharge resistors, it is recommended to use a cement resistance of 25 ohms, 100W or more.
4. Maintenance auxiliary materials/tools requirements:
Solder paste, flux, circuit board cleaning solution, and anhydrous alcohol; circuit board cleaning solution is used to clean up the flux residue after repair; the thermal paste is used to apply on the chip/heat sink after repair (some models need to use thermal paste ); tin tool steel mesh, tin removal wire, and solder ball (ball diameter is recommended to be 0.4mm); when replacing a new chip, it is necessary to plant tin on the chip pins, and make a short circuit pin, which is used for maintenance. To short-circuit the 1.8V and RO signals to determine the problem.
5. Common maintenance spare material requirements:
0402 Resistor (55R and 10K); 0402 Capacitor (0.1uf, 1uf)
II. Maintenance requirements
1. Maintenance personnel must have certain electronic knowledge, more than one year of maintenance experience, and be proficient in BGA/QFN/LGA packaging and welding technology.
2. After maintenance, the hash board must be tested more than two times, and all are OK before passing!
3. Pay attention to the operation method when replacing the chip. After replacing any accessories, the PCB board has no obvious deformation. Check the replacement and surrounding parts for open circuit and short circuit problems.
4. Check whether the tool and the jig can work normally and determine the software parameters of the maintenance station test software, the version of the test jig, etc.
5. For repairing and replacing the chip, it is necessary to test the chip first and then do the functional test after passing. The functional test must ensure that the small heat sink is welded OK. The large heat sink should be coated with thermal glue and installed as required, and the cooling fan should be at full speed. Two hash boards should be placed simultaneously to form an air duct when using the chassis to dissipate heat. When measuring the signal, 4 cooling fans are assisted, and the fans are kept at full speed.
6. When the power board is powered on, the negative copper wire of the power supply must be connected first. Next, the positive copper wire of the power supply must be connected, and finally, the signal cable must be inserted. When disassembling, the order of installation must be reversed. First, remove the signal cable, then remove the positive copper wire of the power supply, and finally remove the negative copper wire of the power supply. If you do not follow this order, it is straightforward to cause damage to R8, R9, U1, and U2 (not all chips can be found). Before testing the pattern, the hash board must cool down the area that has just been repaired before testing; otherwise, it will result in testing PNG.
7. When replacing a new chip, the solder paste of the pins should be printed to ensure that the chip pins are pre-tinned and then soldered to the PCBA to avoid poor soldering.
III. Tesr fixture production and precautions
The matching fixture should meet the heat dissipation of the hash board and facilitate the measurement of signals.
1. Use the test fixture SD card to swipe the program to update the FPGA of the control board, decompress it and copy it to the SD card, and insert the card into the fixture card slot; power on for about 1 minute and wait for the control board indicator light to double flash 3 times, and then the update is complete.
2. Make the test SD card according to the requirements, use the PT1 file to make the SD card for the single-sided heat sink, and use the PT2 file to make the double-sided heat sink.
IV. The principle overview
1. Working structure of T19 hash board:
The hash board is composed of 76 BM1398 chips, which are divided into 38 groups, and each group is composed of 2 ICs; the working voltage of the BM1398 chip used in the T19 hash board is 0.36-0.38V and the reciprocal 32-38 groups are powered by 19V to the DCDC, and VDD13.64V of the 38th domain provides an output of 1.8 V from the DCDC of the 31st domain to provide 1.8V to the LDO. So the voltage of each domain is reduced by 0.36V each time it goes back, and the 1.8V of this domain provides all 0.8V through the LDO output.
2. T19 hash board boost circuit:
The boost is powered by a power supply, from 14V to 19V.
3. T19 chip signal direction:
1) The CLK (XIN) signal flow is generated by the Y1 25M crystal oscillator and transmitted from chip No. 01 to chip No. 76; during operation, the voltage is 1.8V (oscilloscope), and the multimeter measures about 0.7-1.2V;
2) The TX (CI, CO) signal flows from the IO port 7 pin (3.3V) into the level conversion IC U2 and then from chip No.01 to No.76 transmission; when the IO line is not inserted, the voltage is 0V, and during operation, the voltage is 1.8V (oscilloscope)
3) RX (RI, RO) signal flow, from chip No.76 to chip No.01, through U1 back to the 8th pin of the signal cable terminal and back to the control board; when the IO signal is not inserted, the voltage is 0.3V, and the voltage during operation is 1.8V (measured by a multimeter);
4) BO (BI, BO) signal flow, from chip No.01 to No.76; the multimeter measures 0V;
5) The flow of the RST signal is input from IO port 3, and then transmitted from chip No.01 to chip No.76; if no IO signal is inserted, it is 0V during standby and 1.8V during operation (multimeter test volume);
4. The whole miner structure:
The whole miner is mainly composed of 3 hash boards, 1 control board, an APW12 power supply, and 4 cooling fans.
V. Common bad phenomena of the hash board and troubleshooting steps
Phenomenon 1: The single-board test detection chip is 0 (PT1/PT2 station)
Step1: Check the power output first.
Step 2: Check the voltage domain voltage output.
The voltage of each voltage domain is about 0.36V, and the 14V power supply generally has a domain voltage. First, test the output of the power supply terminal of the hash board and whether the MOS is short-circuited (measure the resistance between pins 1, 4, and 8). If 14V is powered but no domain voltage, continue troubleshooting.
Step3: Check the PIC circuit.
Measure whether the second pin of U3 has output; the voltage is about 3.2; if there is, please continue to troubleshoot the problem; if there is no 3. 3V, please check whether the connection status of the test fixture cable and the hash board is 0K, and re-burn the PIC record. When the hash board is powered, the 14V voltage first passes through the U10 circuit and outputs a voltage of about 21V to supply power to the load circuit of the R41 and R42 resistors. Pin 11 of PIC U3 will output a voltage of about 3V to supply power to pin 1 of Q4 (The voltage of pin 1 of Q4 to pin 2, measured by a multimeter); after pin 1 of Q4 receives the voltage, it will turn on pin 2 and pin 3, and pull down the voltage of pin 1 of Q5, so that pins 2 and 3 of Q5 will be turned on. In the end, the 4-pin batteries of Q2, Q3, Q6, and Q7 are pulled up to 21V and turned on, and then the hash board has the domain voltage.
1. Burn the PIC program of the hash board.
Download the programming tool: PICkit3, pin 1 of the PICkit3 cable corresponds to pin 1 of J3 on the PCB board and needs to be connected to pins 1, 2, 3, 4, 5, and 6.
2. Burning software:
Open MPLAB IPE, select device: PIC16F1704, click power to select the power supply mode (the recommended voltage is 3.375V, not 5V), then click operate; the first step: select file to find the .HEX file to be burned, the second part: Click connect and the connection is normal, the third step: click the program button, click verify after completion, and prompt the verification completion to prove that the programming is successful.
Step4: Check the boost circuit output.
C55 can measure 19V.
Step5: Check each group of LDO 1.8V or PLL 0.8V output.
Step6: Check the signal output of the chip (CLK/CI/RI/BO/RST).
The reference signal direction describes the voltage range. If the measurement encounters a large deviation of the voltage value, it can be compared with the measurement value of the adjacent group.
Assuming that the chip’s signal pins’ output voltage is normal, the chip’s detection is still incomplete. For example, if 75 chips are detected, it can be checked by short-circuiting RO and 1.8V. If 74 chips can be detected after short-circuiting, then 74 chips can be short-circuited until a certain chip is short-circuited, and the fixture also reports this chip. It can be determined that there is a problem with the next chip.
Phenomenon2: The single-board detection chip is incomplete (PT1/PT2 station)
Check the chip’s relevant signals (CLK/CI/RI/BO/RST) before and after the error reporting position, locate the bad position according to the IC whose signal is abnormal, and carry out maintenance about the signal direction and voltage value range. Assuming that the chip’s output voltage of the signal pins is normal, the detection of the chip is still incomplete. For example, if 75 chips are detected, it can be checked by short – circuiting RO and 1.8V; if 74 chips can be detected after short – circuiting, then 74 chips can be short-circuited until a certain chip is short-circuited, and the test fixture also reports this chip; basically, it can be determined that the next chip has a problem.
Phenomenon3: Single board Pattern NG, that is, the reply nonce data is incomplete (PT2 station)
Pattern NG is caused by the difference between the characteristics of the chip and other chips, so just replace the chip. According to the log information, the replacement rule is: to replace the chip with the lowest response rate in each domain. The following picture shows one of the test logs. From the log, it can be seen that the recovery rate of the four chips of asic is low. 36 and 37 are in the same domain, so replace the one with the lowest nonce in 36 and 37. At the same time, replace the two chips 43 and 75.
PS: Special attention should be paid to the numbers of domains and ASIC starting from 0.
There is also a phenomenon of PATTERN-NG: it shows that the screen is constantly refreshed, showing BTC_check_register: reg_value_buf buffer is full! In this case, at least one chip on the PCBA is defective. The repair method uses a 50M single-board PT2 to test the firmware and perform a single Board test; when the LOG display chip detects 76 chips and starts to refresh the screen, use a short-circuit pin to short-circuit one of the chips. If there is no problem with this chip, for example, short-circuit the fifth chip RO input and 1.8V, after 15 seconds, the LOG will stop brushing the screen, which proves that the 5th chip is no problem. However, when the 6th chip is short-circuited, if the screen continues to refresh after 15 seconds, it can prove a problem with the 6th chip. It may be that it is not soldered correctly or the chip is defective.
Phenomenon4: Abnormal temperature reading during the test (PT2 station)
Check whether the matching resistors are the four temperature sensors U4, R28~R30, U6, R31~R33, U7, R34~R36, U8, R37~R39, and the matching resistors are abnormally welded. It is located on the front of the PCB, and the temperature-sensing 3.3V power supply is normal.
Check the welding quality of the heat sink on the opposite side of the chip connected to the temperature sensor. If the heat sink is not welded properly, it will affect the temperature difference.
VI. The control board problem causes the following problems
1. The whole miner does not work
The first step is to check whether the voltage of several voltage output points is normal. For example, if 3.3V is short-circuited, U8 can be disconnected first. If it is still short-circuited, the CPU can be unplugged and measured. For other abnormal voltages, generally replace the corresponding converter IC.
The second step, if the voltage is normal, please check the welding condition of the DDR/CPU.
The third step, try to update the flash program with the SD card.
2. The whole miner cannot find the IP
There is a high probability that the IP cannot be found due to abnormal operation. Refer to point 1 for troubleshooting.
Check the appearance and welding condition of the network port, network transformer T1, and CPU.
3. The whole miner cannot be upgraded
Check the appearance and welding condition of the network port, network transformer T1, and CPU.
4. The whole miner fails to read the hash board, or the chain is missing
A. Check the cable connection status.
B. Check the parts of the control board corresponding to the chain.
C. Check the wave soldering quality of the plug-in pins and the resistance around the plug-in interface.
VII. The whole miner failure phenomenon
Common miner failure symptoms: IP cannot be detected, an abnormal number of fans detected, abnormal detected chain. If an abnormality occurs during the test, follow the prompts on the monitoring interface and test LOG for maintenance.
1. The fan display is abnormal:
We have to check whether the fan normally works, whether the connection with the control board is normal, and whether the control board is abnormal;
2. Less chain
Less chain means one less of 3 hash boards. In most cases, there is a problem with the connection between the hash board and the control board. Check the cable to see if there is an open circuit. If there is no problem with the connection, you can perform a PT2 test on the board to see if the test can pass. If it can pass, it can basically be determined that it is a problem of the control board. If the test fails, use the method of repairing PT2 to repair;
3. Abnormal temperature
Generally, the temperature is high. The maximum PCB temperature set by our monitoring system cannot exceed 90 °C. If it exceeds 90°C, the miner will alarm and cannot normally work, which is generally caused by the high ambient temperature. In addition, abnormal operation of the fan will also cause abnormal temperature;
4. Insufficient number of chips
If the number of chips is not enough, you can refer to PT2 for testing and maintenance;
1) One board has a low hash rate: In this case, you can log in to the IP through the Putty software to observe whether the board’s domain operating voltage and NONCE return are normal. Repair according to Putty’s LOG prompts;
2) The specific operations are as follows:
A. Open Enter the IP of the miner in question and click OPEN;
B. Enter the user name, password, and test command to see the NONCE return and voltage domain status. If the NONCE and domain voltage are abnormal, you can measure and repair them according to the printed abnormal chip.
Another situation: The test shows that a board is stable and has no hash rate or low hash rater, and the single board PT2 test is OK again. In this case, take out the hash board in question and let this hash board mine solo with 100M of test firmware. When there is a problem with this board, use a multimeter to measure the signal, and you can find the problem.
VIII. Other matters needing attention
Maintenance flow chart:
~Routine inspection: First, visually inspect the hash board to be repaired to observe whether there is PCB deformation and scorching. If there is, it must be dealt with first; check whether there are parts with obvious burn marks, parts impact offset or missing parts, etc.; secondly, visual inspection after there is no problem, you can first test the impedance of each voltage domain to detect whether there is a short circuit or an open circuit. If found, it must be dealt with first. Again, check whether the voltage of each domain has0.36vabout.
~After the routine inspection is OK (generally, the short-circuit inspection of the routine inspection is necessary to avoid burning the chip or other materials due to a short circuit when the power is turned on), the chip can be inspected with a test fixture, and the positioning is determined according to the test results of the test fixture.
~According to the display results of the test fixture, start from the vicinity of the faulty chip, check the chip test points (CO/NRST/RO/XIN/BI) andVDD0V8, VDD1V8equal voltage.
~Then, according to the signal flow, the RX signal is transmitted in the reverse direction (No.76 to No.1 chip), and several signals CLK CO BO RST are transmitted in the forward direction (No.1 to No.76 chip), and the abnormal fault point is found through the power supply sequence.
~When the faulty chip is located, the chip needs to be re-soldered. The method is to add flux (preferably no-clean flux) around the chip, heat the solder joints of the chip pins to a dissolved state, and promote the chip pins and pads to re-run in and close the tin, to achieve re-solder. If the fault is still the same after re-solder, the chip can be replaced directly.
~After repairing the hash board, more than two passes are required to be judged as a good product when testing the test jig. After the first replacement of the parts is completed, wait for the hash board to cool down, use the test fixture to test the pass, and put it aside before cooling. Wait a few minutes for the hash board to cool down before running a second test.