Tesla has released information related to the operating time of the network protocol TTPOE (Tesla Transport Protocol Over Ethernet), aimed at reducing traffic delays in Datacenters and infrastructures supporting machine learning systems. The company plans to standardize TTPOE and has joined the UEC consortium (Ultra Ethernet Consortium) to achieve this goal. The implementation code for TTPOE is available in SI and open source under the license gplv2.
The TTPOE protocol is designed to replace TCP in applications requiring low latency and high data transfer speeds. Similar to TCP, TTPOE allows for packet discarding and retransmission while guaranteeing the delivery of all data sent. TTPOE is designed for use in networks with a bandwidth above 100GBPS and was initially implemented at the hardware level to facilitate communication between nodes in the supercomputer dojo.
The introduction of TTPOE has simplified the addition of new nodes to the Dojo cluster, which processes visual information for training AI models. This required the transfer of large data volumes between nodes with minimal delays of only a few tens of microseconds. The protocol was designed for relatively simple implementation at the hardware level, working on top of Ethernet and replacing TCP in the network stack with the TTP protocol. This implementation uses a simpler finite machine (State Machine) and existing Ethernet connectors for operation.
To reduce delays in TTP compared to TCP, the waiting time for connection closing (Time_wait) and the number of steps when closing the connection have been minimized. In TCP, closing a connection involves sending the FIN signal, waiting for confirmation, sending a confirmation, and entering the Time_wait state. In TTP, the process is simplified, requiring only the exchange of closing codes and confirmation of closure (Close, Close-ack).