Mistral vs Llama vs DeepSeek – A Comparative Study for Exolynk Data Translations
Readingtime: 12min
Table of contents
Summary
In a comparative study, we tested five large language models (LLMs) for quality, speed, and cost in multilingual data translation with the Exolynk platform. DeepSeek-V3 provided the best translations but was too slow. Mixtral-8x7B offered good value for money, while Llama 3.3 70B achieved the best combination of quality and speed. Therefore, we chose Llama 3.3 70B as our preferred model. The study highlights that efficient AI translations are crucial for international, collaborative platforms.
The Need for Multilingual Data
The Exolynk platform was designed from the outset as a multilingual software solution that not only considers the user interface (UI) and system texts but also the data within the system itself. This functionality remains unique in the software industry and provides a significant advantage, especially in international, collaborative work environments where language barriers must be overcome. In multilingual countries like Switzerland and for globally operating companies, this type of multilingual data support is essential.
Multilingualism offers crucial advantages in various application areas:
- Customer Support: Global companies must handle support inquiries in different languages without relying on manual translations.
- Manufacturing: Factories and production facilities often employ international teams. To ensure that each step is carried out accurately, native translations of design data are essential.
- Research and Collaboration: International research projects require platforms capable of collecting and analyzing data in multiple languages.
Example: Translation Exolynk platform (click to play GIF)
The Challenge: Creating Multilingual Datasets
In the past, creating and maintaining multilingual datasets was a labor-intensive and expensive process. Manual translations were costly and often failed to consider context or specialized terminology. Machine translations, such as Google Translate, offer a quick solution but are often inaccurate, particularly with complex or industry-specific content. Addressing these shortcomings is where large language models (LLMs) come into play.
LLMs as a Solution: Advances in Machine Translation
The availability of LLMs has revolutionized machine translation. These models understand multiple languages and go beyond simple word-for-word translations. They consider context and provide higher-quality results, recognizing technical language and nuances. Another advantage of LLMs is that they continuously learn, improving their translation accuracy over time.
Model Selection for Exolynk: Focus on Privacy and Efficiency
Choosing the right model for Exolynk required not only finding a powerful translation model but also meeting strict data privacy requirements. In collaboration with Together.ai, a provider of open-source models that do not use data for training purposes, we tested several models to achieve the best translation results without compromising privacy or efficiency.
The following models were evaluated:
Model Name | Number of Parameters | Quantization | Cost per 1 Million Tokens |
---|---|---|---|
Mixtral-8x22B Instruct | 141B | FP16 | $1.20 |
Mixtral-8x7B Instruct | 46.7B | FP16 | $0.60 |
DeepSeek-V3 | 671B | FP8 | $1.25 |
Llama 3.3 70B Instruct Turbo | 70B | FP8 | $0.88 |
Llama 3.1 8B Instruct Turbo | 8B | FP8 | $0.18 |
The number of parameters in a model directly impacts its performance: the more parameters, the faster and more precise the model can work, but at higher costs. Therefore, selecting a model with the best price-performance ratio is crucial for efficient translation.
Prompt Design: Automated Translations with Context
Exolynk allows users to manually translate data but aims to automate the translation process. The goal is to feed the model structured JSON data and receive correct translations in multiple languages.
Example of a JSON data format:
{
"de": "Hallo Welt",
"en": "",
"fr": "",
"it": ""
}
System-Prompt:
You are a translator who is translating an input JSON structure to all given languages inside the JSON. You return a JSON with all languages correctly translated. It is super important that you only reply with the valid JSON and no other text or descriptions.
This precise instruction ensures that the model delivers only the translated JSON structure without additional explanations or text.
Results: Comparing Model Performance
In our tests, we evaluated the models based on quality, speed, and cost. Quality was assessed by comparing the translations with manual translations, while speed was measured by tokens per second (tokens/s) and cost per 1,000,000 tokens.
Model Name | Quality (%) | Price per 1 Million Tokens | Tokens/s | Overall Rating (%) |
---|---|---|---|---|
Mixtral-8x22B Instruct | 78,21% | $1.20 | 67,49 | 65,22% |
Mixtral-8x7B Instruct | 98,57% | $0.60 | 119,61 | 60,11% |
DeepSeek-V3 | 99,29% | $1.25 | 10,92 | 66,43% |
Llama 3.3 70B Instruct Turbo | 95,71% | $0.88 | 207,41 | 79,32% |
Llama 3.1 8B Instruct Turbo | 83,21% | $0.18 | 266,67 | 61,07% |
Evaluation Chart:
Key Findings:
- DeepSeek-V3: Highest quality but the slowest performance.
- Mixtral-8x22B Instruct: Weakest performance in the test, both in quality and speed.
- Mixtral-8x7B Instruct: Good quality, but minor errors in complex translations. Good price-performance ratio.
- Llama 3.1 8B Instruct Turbo: Low quality due to its small model size. Cheap and fast but unsuitable for demanding translation tasks.
- Llama 3.3 70B Instruct Turbo: Best model in terms of quality and speed. Ideal for fast, precise translations.
Conclusion: Choosing the Right Model for Exolynk
After extensive analysis and testing, we decided to use the Llama 3.3 70B Instruct Turbo model for Exolynk’s translation functions. It offers the best combination of translation quality, speed, and cost—perfectly meeting the needs of our multilingual platform. Despite the excellent quality of DeepSeek-V3, its slow processing speed was a decisive factor in our decision.
Choosing the right model is crucial for a seamless user experience, especially in an international and multilingual context, and Llama 3.3 70B has proven to be the optimal model.