HANOI, Vietnam, June 5, 2026 /PRNewswire/ — FPT Corporation and NVIDIA today announced the release of the Nemotron-Personas-Vietnam dataset to advance sovereign AI development across Southeast Asia. The dataset is open for commercial use, giving developers, researchers, and enterprises access to an open, auditable dataset designed to help build AI systems that better reflect Vietnam’s language, culture, workforce, and economic realities.
Nemotron-Personas Vietnam extends NVIDIA’s open Nemotron ecosystem of models, datasets, evaluation resources, and NVIDIA NeMo libraries, enabling developers to customize, evaluate, and deploy AI systems for local use cases.
Equipping Innovators to Build AI That Reflects Local Realities
The collaboration between FPT and NVIDIA is driven by a shared goal: to give AI innovators open, efficient models, datasets, and libraries to adapt their AI systems so they reflect local language, culture, regulations, data infrastructure, and economic goals, rather than relying on globally generic models that fail to serve specific communities.
NVIDIA contributes the open model framework, NeMo Data Designer synthetic data library, and the Nemotron-Personas methodology – a structured approach to building population-scale synthetic datasets that are auditable, demographic-grounded, and developer-ready.
FPT, as an NVIDIA Preferred Partner, contributed deep local expertise, validation methodologies, data infrastructure, and AI research capabilities through three key entities:
- FPT Smart Cloud: Provides the NVIDIA-accelerated GPU cloud services and inference-ready AI platforms that underpin the dataset’s development and deployment.
- Quantum AI and Cyber Security Institute: Provides research expertise and capabilities, leading the technical methodology and validation of the Nemotron-Personas-Vietnam dataset.
- FPT DC5: Operates field survey, contributing survey-collected persona data and logistical resources to the data pipeline.
Grounding AI in Vietnam’s Language, Demographics, and Labor Reality
Nemotron-Personas Collection extends NVIDIA’s Nemotron model family with population-scale synthetic datasets grounded in real-world demographic and labor statistics. These are structured, auditable datasets that mirror how people actually live, work, and communicate.
The Nemotron-Personas-Vietnam dataset applies this methodology to Vietnam, capturing the linguistic diversity, demographic breadth, and labor characteristics specific to the Vietnamese population.
The Nemotron-Personas-Vietnam dataset comprises 900,000 synthetic personas grounded in the country’s latest official statistics and geographic structure. Each record contains 31 fields, including 9 personas, 6 persona attributes, 15 contextual attributes, and 1 unique identifier, giving developers precise control to filter and target specific population subsets. It is available open-source on HuggingFace and is compatible with NVIDIA NeMo libraries across the full AI development lifecycle, from data curation and fine-tuning through post-training and deployment.
“FPT believes that sovereign AI must be built from the ground up to reflect local language, culture, and economic realities. The Nemotron-Personas-Vietnam dataset represents our commitment to making localized AI development openly accessible for every innovator building AI solutions for Vietnam and the broader region,” said Associate Professor Dr. Ngo Xuan Bach, Director of AI Product Center, FPT Smart Cloud, and Director of the Quantum AI & Cyber Security Institute, FPT Corporation.
Putting Sovereign AI Into Production, At Scale, In-Country
Sovereign AI is especially important for countries and industries where generic models are not enough to meet specific goals. Nations need AI that speaks their language, understands their laws, and fits their local context. Building and deploying sovereign AI in-country requires a robust AI cloud platform equipped for accelerated computing and inference at scale.
Guided by the vision of “Build Your Own AI,” FPT is deeply committed to the mission to master AI technologies and empower AI innovators to train and deploy AI within the regional boundaries through three integrated layers:
- NVIDIA-accelerated GPU Cloud services offering the compute foundation for training and running large-scale AI models in-region
- Inference-ready AI platforms giving the necessary tools to deploy frontier AI models at scale
- Ready-to-use AI applications bringing sovereign AI capabilities directly to Vietnamese businesses and institutions
Together, these layers form a complete sovereign AI stack, from raw data and open models to deployed, localized AI products, built for Vietnam and replicable across the region.



