X's New Algorithm Revealed: Likes Are Almost Worthless, This Action's Value Skyrockets 150 Times
- Core Viewpoint: X (formerly Twitter) has open-sourced its new recommendation algorithm, Phoenix. The core change is a shift from the old algorithm that relied on manually defined features to a model entirely driven by AI large language models (Grok transformer). It ranks content by predicting 15 types of user interaction behaviors, fundamentally altering the underlying logic of content distribution.
- Key Elements:
- Algorithm Architecture Innovation: The new Phoenix completely abandons manual feature engineering, instead using the Grok transformer model to predict and recommend content based on sequences of user historical behaviors (such as liking, replying, blocking).
- Content Ranking Mechanism: The algorithm calculates a total score to determine content ranking by predicting the probability of 15 potential user behaviors—both positive (e.g., reply, retweet) and negative (e.g., report, block)—and performing a weighted sum.
- Creator Strategy Changes: Old tricks like "best time to post" are now ineffective. The new system encourages creators to actively reply to comments (which carries extremely high weight), avoid triggering negative user behaviors like blocking, and suggests placing external links in comment sections rather than the main post body.
- Transparency and Limitations: While the open-source release provides the complete system architecture and logic, it does not disclose specific weight parameters, internal model parameters, or training data, leaving a gap from achieving "full transparency."
Original Author: David, TechFlow
On the afternoon of January 20, X open-sourced its new recommendation algorithm.
Musk's accompanying reply was quite interesting: "We know the algorithm is stupid, it needs a lot of work, but at least you can see us struggling in real-time to improve. Other social media platforms don't dare do this."

This statement has two layers of meaning. First, it acknowledges the algorithm has problems; second, it uses "transparency" as a selling point.
This is the second time X has open-sourced its algorithm. The 2023 version of the code hadn't been updated for three years and was long disconnected from the actual system. This time it's a complete rewrite, with the core model switching from traditional machine learning to the Grok transformer. The official statement is that it "completely eliminates manual feature engineering."
In plain language: the old algorithm relied on engineers manually tuning parameters; now, AI directly looks at your interaction history to decide whether to push your content.
For content creators, this means the old mystique of "the best time to post" or "which tags grow followers" might no longer work.
We also looked through the open-sourced GitHub repository and, with the help of AI, found some hard-coded logic in the code worth digging into.
Algorithm Logic Change: From Manual Definition to AI Automatic Judgment
First, let's clarify the difference between the old and new versions; otherwise, the following discussion might get confusing.
In 2023, the version Twitter open-sourced was called Heavy Ranker, essentially traditional machine learning. Engineers had to manually define hundreds of "features": whether the post had an image, how many followers the poster had, how long ago it was posted, whether the post contained a link...
Then, weights were assigned to each feature, tweaked back and forth to see which combination performed best.
This newly open-sourced version is called Phoenix, with a completely different architecture. You can think of it as an algorithm more reliant on large AI models, with its core using Grok's transformer model, the same type of technology used by ChatGPT and Claude.
The official README document states bluntly: "We have eliminated every single hand-engineered feature."
All those traditional rules relying on manually extracted content features are gone, completely eliminated.
So now, what does this algorithm rely on to judge whether a piece of content is good or not?
The answer is your behavioral sequence. What you've liked in the past, who you've replied to, which posts you've stayed on for over two minutes, which types of accounts you've blocked. Phoenix feeds these behaviors to the transformer, letting the model learn the patterns and summarize them itself.

An analogy: the old algorithm was like a manually written scoring sheet, ticking boxes for points;
The new algorithm is like an AI that has seen all your browsing history, directly guessing what you want to see next.
For creators, this means two things:
First, old tricks like "optimal posting time" or "golden tags" have lower reference value. Because the model no longer looks at these fixed features; it looks at each user's personal preferences.
Second, whether your content gets pushed increasingly depends on "how people who see your content will react." This reaction is quantified into 15 types of behavior predictions, which we'll discuss in detail in the next chapter.
The Algorithm Predicts Your 15 Types of Reactions
After Phoenix receives a post to be recommended, it predicts 15 possible behaviors the current user might have upon seeing this content:
- Positive Behaviors: Such as like, reply, retweet, quote retweet, click on the post, click on the author's profile, watch more than half of a video, expand an image, share, stay for a certain duration, follow the author.
- Negative Behaviors: Such as clicking "Not interested," Blocking the author, Muting the author, reporting.
Each behavior corresponds to a predicted probability. For example, the model judges you have a 60% probability of liking this post, a 5% probability of blocking this author, etc.
Then the algorithm does a simple thing: multiplies these probabilities by their respective weights and sums them up to get a total score.

The formula looks like this:
Final Score = Σ ( weight × P(action) )
Positive behaviors have positive weights; negative behaviors have negative weights.
Posts with high total scores rank higher; low ones sink.
Stepping away from the formula, it boils down to this:
Now, whether content is good is no longer determined by how well the content itself is written (though readability and altruism are still the foundation of dissemination); it increasingly depends on "what reaction this content will elicit from you." The algorithm doesn't care about the quality of the post itself; it only cares about your behavior.
Thinking along these lines, in extreme cases, a vulgar post that people can't help but reply to and mock might score higher than a high-quality post with no interaction. This might be the underlying logic of this system.
However, the newly open-sourced version of the algorithm did not disclose the specific numerical values of the behavior weights, but the 2023 version did.
Old Version Reference: One Report = 738 Likes
Next, we can dig into that set of data from '23. Although old, it helps you understand how much "value" different behaviors have in the algorithm's eyes.
On April 5, 2023, X did publish a set of weight data on GitHub.
Straight to the numbers:

Translated more plainly:

Data source: Old version GitHub twitter/the-algorithm-ml repository, click to view the original algorithm
A few numbers are worth a closer look.
First, likes are almost worthless. The weight is only 0.5, the lowest among all positive behaviors. In the algorithm's eyes, the value of a like is approximately zero.
Second, conversational interaction is the real hard currency. The weight for "you reply, and the author replies back to you" is 75, which is 150 times that of a like. What the algorithm wants to see most is not one-way likes, but back-and-forth dialogue.
Third, negative feedback carries an extremely high cost. One Block or Mute (-74) requires 148 likes to offset. One report (-369) requires 738 likes. Furthermore, these negative points accumulate in your account's reputation score, affecting the distribution of all subsequent posts.
Fourth, the weight for video completion rate is absurdly low. Only 0.005, almost negligible. This is in stark contrast to Douyin and TikTok, which treat completion rate as a core metric.
The official document also states: "The exact weights in the file can be adjusted at any time... Since then, we have periodically adjusted the weights to optimize for platform metrics."
Weights can be adjusted at any time, and indeed have been adjusted.
The new version didn't disclose specific numbers, but the logical framework written in the README is the same: positive actions add points, negative ones subtract, weighted sum.
The specific numbers might have changed, but the order of magnitude relationship likely remains. You replying to someone's comment is more useful than receiving 100 likes. Making someone want to Block you is worse than having no interaction.
Knowing This, What Can We Creators Do?
After digging through the old and new algorithm code of Twitter/X, and combining the insights, here are a few actionable conclusions.
1. Reply to your commenters. In the weight table, "Author replies to commenter" is the highest scoring item (+75), 150 times higher than a user's one-way like. This isn't about begging for comments, but replying when someone comments. Even a simple "thanks" gets noted by the algorithm.
2. Don't make people want to scroll away. The negative impact of one block requires 148 likes to offset. Controversial content does easily spark interaction, but if the interaction is "this person is annoying, block," your account's reputation score will be continuously damaged, affecting the distribution of all your future posts. Controversial traffic is a double-edged sword; it cuts you before it cuts others.
3. Put external links in the comments. The algorithm doesn't want to export users off the platform. Including links in the main text gets demoted, something Musk himself has publicly stated. If you want to drive traffic, write the content in the main post and put the link in the first comment.
4. Don't spam. The new code has an Author Diversity Scorer, which demotes posts from the same author appearing consecutively. The design intent is to diversify a user's feed; the side effect is that posting ten times in a row is worse than posting one well-crafted piece.
6. There's no "best time to post" anymore. The old algorithm had "posting time" as a manual feature; the new version says it's cut. Phoenix only looks at user behavior sequences, not what time a post was made. Those guides claiming "Tuesday at 3 PM is the best time to post" have increasingly lower reference value.
The above are things readable from the code level.
There are also some bonus/deduction items from X's public documentation, not in this open-sourced repository: Blue check verification has a bonus, all caps gets demoted, sensitive content triggers an 80% reach reduction. These rules aren't open-sourced, so we won't expand on them.
In summary, what was open-sourced this time is quite substantial.
The complete system architecture, candidate content retrieval logic, ranking/scoring process, implementation of various filters. The code is mainly in Rust and Python, with a clear structure, and the README is more detailed than many commercial projects.
But a few key things weren't released.
1. Weight parameters weren't disclosed. The code only states "positive behaviors add points, negative behaviors subtract points." How many points a like is worth, how many a block deducts, wasn't said. The 2023 version at least showed the numbers; this time, only the formula framework was given.
2. Model weights weren't disclosed. Phoenix uses the Grok transformer, but the parameters of the model itself weren't released. You can see how the model is called, but not how it calculates internally.
3. Training data wasn't disclosed. What data the model was trained on, how user behavior was sampled, how positive/negative samples were constructed—none of this was mentioned.
An analogy: this open-sourcing is equivalent to telling you "we use weighted summation to calculate the total score," but not telling you what the weights are; telling you "we use a transformer to predict behavior probability," but not showing you what's inside the transformer.
In a horizontal comparison, TikTok and Instagram haven't even disclosed this much. The amount of information X open-sourced this time is indeed more than other mainstream platforms. It's just that there's still a distance from "complete transparency."
This isn't to say the open-sourcing has no value. For creators and researchers, seeing the code is better than not seeing it.


