Network packet loss is a phenomenon in which packets are lost in the channel for various reasons when we use ping to query the target station. Ping uses ICMP echo request and echoes reply message. An ICMP echo request message is a query sent by a host or router to a specific destination host. The machine receiving the message must send an ICMP echo reply message to the source host. This inquiry message is used to test whether the destination station is reachable and understand its status.
In many cases, we may encounter intermittent faults in network connections. In the face of such network failures, many network administrators use the Ping command to test network connectivity. The test results show that the network transmission line data packet loss phenomenon is very serious at this time, then what factors cause the data packet loss phenomenon to be more serious? Is the connection line contact unstable? Is it a network virus? Or other potential factors?
Cause one: physical line failure
The network administrator finds that the WAN line is up and down. When this happens, there may be a line failure or a user caused. In order to distinguish whether it is a line fault, you can do the following test.
If the WAN line is implemented through a router, you can log in to the router and send a large number of packets to the peer router WAN interface for testing by extending the ping. If the line is implemented through a Layer 3 switch, you can connect one computer to each end of the line and set the IP address to the WAN interface address of the local Layer 3 routing switch. Use the "ping" peer computer address "-t" command to test.
If no packet loss occurs in the above test, it indicates that the line provided by the line operator is good, and the cause of the failure lies in the user itself, which needs further search.
If packet loss occurs in the above test, it indicates that the fault is caused by the line provided by the line provider. You need to contact the line provider to solve the problem as soon as possible.
There are still many packet loss caused by physical lines, such as fiber connection problems, jumper misalignment with device interfaces, twisted pairs, and RJ-45 connectors. In addition, the communication line is subject to random data or data error caused by burst noise, and the interference of the radio frequency signal and the attenuation of the signal may cause loss of the data packet. We can check the quality of the line with the help of a network tester.
Cause two: equipment failure
A device fault mainly refers to a fault in the hardware of the device and does not include packet loss caused by improper software configuration. If the network card is bad, a physical fault occurs on one of the ports of the switch, the electrical port of the optical transceiver is connected to the network device, or the duplex mode of the interface of the two devices does not match.
I have seen such an example, a packet loss caused by a fiber-optic module failure of a switch port. The switch crashes after a period of communication, that is, it cannot communicate, and returns to normal after restart. After a period of observation, it was found that there was a problem with a fiber optic module, and a new module was replaced, and everything was normal.
The reason is that the switch will perform CRC error detection and length check on all received packets, and will check out the packets with errors and forward the correct packets. However, some packets with errors in this process did not detect errors in both CRC error detection and length check. Such packets will not be sent during the forwarding process and will not be discarded. They will accumulate in the dynamic cache and will never be sent out. When the cache is full, it will cause the switch to crash. The end result is that the packet cannot reach the destination host.
Reason three: network congestion
There are many reasons for the increase in packet loss rate caused by network congestion, mainly due to the large occupancy of router resources.
If the network speed is slow and the packet loss rate is increasing, you should use the show process cpu and show process mem commands. In general, the IP input process is used to occupy too many resources. Next, you can check if fast switching is disabled on the large traffic outbound port, and if so, it needs to be reused.
Look again at whether fast switching on the same interface is disabled. If an interface is configured with multiple network segments and the traffic between these segments is large, the router works in process-switches mode. In this case, run the command enable IP route-cache same-interface on the interface.
Next, use the show interfaces and show interfaces switching commands to identify the ports that a large number of packets enter and exit. Once you are confirmed to enter the port, open the IP accounting on the outgoing interface to see its characteristics. If it is an attack, the source address will change continuously but the destination address will not change. You can temporarily resolve such problems with the command access list (preferably on a device close to the attack source), and the final solution is to stop the attack source.
There are still many situations in the application that cause network congestion, such as a large amount of UDP traffic, you can solve this problem by solving the spoof attack. A large number of multicast streams and broadcast packets traverse the router, the router is configured with IP NAT and there are many DNS packets traversing the router. After the above situation causes the network to be congested, the two communicating parties take traffic control and discard the packets that cannot be transmitted.
Reason 4: Improper MTU configuration
Improper MTU settings on critical devices can also cause network packet loss (Ethernet: 1500 bytes, IEEE 802.3/802.2 1492 bytes). View the MTU configuration of critical devices on your network.
After understanding how to locate the network packet loss, the NMS needs to further analyze the cause of the packet loss to eliminate the fault. After opening the network analysis software, we configured the network file and selected the analysis file to start the analysis.
First, we can add utilization statistics to the chart. It can be seen that after 14:38:05, the network utilization suddenly increased, approaching 40%. The recommended utilization rate is no more than 15%. When the network utilization exceeds 30%, 1% of the packet loss will occur, and the geometric multiple will increase. In this network, the utilization rate is as high as 40%, and there must be a serious packet loss phenomenon.
After understanding that there is packet loss and there will be TCP packet retransmission, the network administrator can find out the host with serious TCP packet retransmission in the diagnosis.
How to determine the existence of network packet loss
Usually, we use the ping x.x.x.x -t command to test whether there is packet loss in the network.
As can be seen in the above figure, when the long-term PING is performed on the non-existent address of 192.168.122.2 on this machine, the ICMP packets sent out are lost, and the loss rate reaches 100%. That is, there is a packet loss from the local machine to the path of the actual unreachable address of 192.168.122.2.
Analysis steps for locating network packet loss
In the case of network packet loss, the user will obviously feel the network speed is slow. At this time, the first thing the network management needs to do is to perform PING X.X.X.X –t to diagnose which network segment is roughly diagnosed. In the case of the discovery that there is indeed a loss rate, we can use Kelai software for further analysis.
Before the analysis, we need to learn about the pre-knowledge.
One of the characteristics of the TCP protocol is to ensure the reliability of data transmission, that is, to ensure that the data can be transmitted correctly and completely. So how does TCP guarantee it? It can be seen that TCP has a transmission acknowledgment retransmission mechanism when transmitting. That is, the transmitting data party prepares a sequence number for each segment when transmitting data, and the receiver sends an acknowledgment to the sender that the segment data is received. In this way, it is confirmed whether the data is accurately transmitted, and is retransmitted when it is impossible to confirm that a piece of segment data is accurately transmitted or that a piece of segment data is not accurately transmitted.
Therefore, in the case of network packet loss, there must be a retransmission of TCP packets.
Solution:
• For network device failures: Through the segmentation capture method, the KLA network analysis system is used to capture packets at both ends of the critical devices in the network to determine whether the device is lost or not, so as to accurately locate the packet loss device.
• For network congestion: Configure mirroring on the core switch and use the Kelai network analysis system to capture packets.
Analyze the traffic usage of critical links (usually egress links), check whether the network usage is too high, whether there are too many packets per second, whether the packet size distribution is reasonable, and whether the TCP session is normal.
Of course, the most fundamental method is to limit user traffic, which is to control the traffic for each Internet user. For example, it is forbidden to access video websites and other websites that are not related to work content, and at the same time, it can make precise traffic restrictions for each user to prevent excessive use of limited network bandwidth.
Quality of Service (QoS) can also be made for some traffic. For example, traffic with a large working relationship: such as web page access, mail traffic, etc., can be prioritized, so that network congestion can be alleviated to a certain extent, and high-priority services can be preferentially forwarded. (Method of palliative treatment)
In addition, there are several reasons why ping IP always loses packets:
Because the server's IIS is running illegally or without a separate process pool, find the site and give him a separate process pool.
If a server with a host header is empty on the server, it is easy to cause this problem. It is better to delete the site with the host empty or to separate the process pool of the site to solve the problem.
• Due to the low bandwidth and traffic restrictions on the server, there are generally some IDC service providers in the computer room that are very restrictive to restrict the user-hosted servers in order to obtain more hosted users, resulting in very few outflows and lost bag problem.
Due to the problem of the switch port of the switch: Firstly, the ping command is used to test that the packet loss occurs at an irregular time, which is initially considered to be the reason of the physical layer. After redoing the RJ45 head of the network cable, the fault remains, and the network cable is not changed. Suspected to be a problem with the NIC interface or switch port. After checking the network card driver is correct, there is no abnormality in the network card interface. Then look at the switch port and find that the switch port working indicator connected to the server flashes between green and yellow, which indicates that the port is not working properly. Log in to the switch using the HyperTerminal and check the parameters of this port. The port is working in 100Mbyte/s full-duplex mode. Check the local connection status before returning to the server. The NIC works in 10Mbyte/s full-duplex mode. The port of the switch is inconsistent with the transmission rate and duplex mode of the NIC. After changing the working model of the network card to 100Mbyte/s full-duplex mode, everything is normal and the fault is solved.
Due to a large amount of packet loss caused by DDOS or flooding beasts, there is nothing to say at this time, just add a hardware firewall.
In summary, the general troubleshooting method is:
Does the bandwidth fill up?
Try another switch port
Try another network cable
Are the network card and motherboard driver not installed (it is generally not the problem)
Is the setting on the switch 100M or 10M, the same as the machine setting?