Dahua unlocks the future of video security with Xinghan Large-Scale AI Models
SHARE
2
October 22, 2025A leader in the field of intelligent IoT centered around video, Dahua has always been on the forefront of AI research and development. The company has recently launched an upgraded version of their Xinghan Large-Scale AI Models, which combine multimodal capabilities and industry knowledge to make video surveillance more intelligent than ever. This article takes a closer look at Xinghan and how it helps users achieve smarter security.
Addressing conventional AI challenges
Xinghan aims to address certain conventional CNN AI challenges – these include difficulty detecting small targets at long distances; false alarms caused by interferences such as birds and leaves; and long customization cycle for creating new algorithms.
“During the industry’s digital and intelligent transformation, AI technology still faces challenges. While algorithm accuracy has reached high levels in some areas, demands for adaptive intelligence across complex, dynamic scenarios and higher accuracy continue to rise. Simultaneously, business needs are evolving from perception and simple cognition to complex cognition. Additionally, complex rule configuration and cumbersome interactions in practical applications hinder usability. With advancements in large model technology, Dahua launched the Xinghan Large-Scale AI Models to address these issues,” said Frank Fang, Overseas Product Director at Dahua, adding that Xinghan aims to solve real user pain points with the following five key differentiators:
From accuracy to precision: Enhancing detection in extreme conditions (for example tiny targets, blurry images and strong backlighting), ensuring stable and reliable recognition;
From customization to generalization: Greatly shortening the development cycle for custom algorithms and reducing complex steps;
From recognition to comprehension: Supporting not only routine behavior recognition but also understanding complex multi-target interactions;
From static to dynamic: Overcoming limitations of static rule configurations to enable autonomous scene parsing and dynamic adaptation;
Enhanced language and multimodal capabilities: Simplifying operations via natural language interaction; processing text, images and video to enable understanding and interaction with the world.
Different models
Debuted in 2023, Xinghan continues to evolve by combining multimodal intelligence and deep domain expertise. This development has led to three core series under Xinghan: Xinghan Vision Models, (vision-centric intelligence), Xinghan Multimodal Models (multimodal-fusion capabilities), and Xinghan Language Models (language-driven interaction). This article examines the Vision and Multimodal Models more closely.
Xinghan Vision Models
The Xinghan Vision Models are
featured in certain camera models under Dahua’s IPC and PTZ series. Since large
models typically reside on servers, deploying Xinghan on edge devices requires
minimizing model size and advanced training, which can be likened to a person’s
education.
“First, we enable the algorithm to undergo unsupervised training using hundreds
of millions unlabeled data, resulting in a massive pre-trained model that is
extensive and diverse, broad but not precise – somewhat like our primary and
secondary school curricula that cover all foundational subjects without delving
deeply into any specific field,” said Xiangming Zhou, R&D Expert at Dahua.
He adds: “To address our specific business needs, we then employ supervised
training with labeled task-specific data to develop our expert task model. This
labeled training phase can be likened to university education – students focus
on their majors, continuously refining professional knowledge while gradually
forgetting many secondary school subjects irrelevant to their specialization.
To meet camera deployment requirements, we further perform knowledge
distillation, fine-tuning, and quantization on the expert task model,
significantly reducing its parameter count. This ultimately yields an edge-side
large model precisely tailored for specific business objectives and products.”
The Xinghan Vision Models make video analysis more accurate and intelligent,
enabling various applications. One of them is Perimeter Protection, where the
detection distance is increased by 50 percent, the detection accuracy still
reaches 98 percent, and the false alarm rate is reduced by 92 percent. Based on
the Xinghan Large-Scale AI Models capability, Perimeter Protection innovatively
releases the AI Rule Assist function, which can automatically analyze the scene
and automatically generate regional intrusion rule lines. It is easy to operate
and improves efficiency. Perimeter Protection also supports more than 10 animal
detections, bringing more value to users.
Xinghan Multimodal Models
Compared to unimodal models, which are confined to processing a single data type (for example text-only or image-only), the Xinghan Multimodal Models are artificial intelligence systems capable of processing multiple heterogeneous data types (such as text, images, and video) in parallel and integrating them deeply, empowering diverse applications such as WizSeek and text-defined alarms.
Leveraging the power of Dahua Xinghan Multimodal model technology, WizSeek transforms video retrieval. It aims to solve video retrieval pain points such as lack of support for multi-condition retrieval and over-reliance on target-events presetting. Suppose the user wants to look for a man making a phone call near a car. With conventional metadata search, the user can only select attributes one by one, and behaviors such as "calling" can't be retrieved. With WizSeek, the user just needs to text “A man making phone call near a car” and locate the footage in a matter of seconds. WizSeek revolutionizes the video search experience, delivering unparalleled speed, precision, and efficiency when navigating vast amounts of video clips, while offering an exceptionally intuitive and streamlined user journey.
Text-defined alarms, meanwhile, build custom arming via the use of text description. New algorithms can be developed through prompt text, greatly reducing the development threshold. In conventional AI, for example, the user creating an algorithm “human pushing a stroller” requires the following to take place: material collection, data annotation, development on device and algorithm training – a process that takes about a month. With text-defined alarms, powered by the multimodal models capabilities, the user only needs to type the text “human pushing a stroller,” and a model will be created and deployed in seconds. After creating a new algorithm for "Text-defined Alarms" in recorders (IVSS), the user can directly perform local training within the same device to optimize algorithm performance, saving significant time and labor costs, and the optimized algorithm can truly help “Text-defined Alarms” realize “More Use, More Accuracy.” The Xinghan Multimodal Models are featured in Dahua Products including NVR, IVSS and IVD.
A leader in AI technologies
In closing, it is said that video surveillance has evolved from seeing a scene to understanding a scene. Dahua has clearly caught up with this trend with Xinghan, which understands complex multi-target interactions, reduces false alarms and shortens deployment cycles, in the process helping users gain more security and business intelligence. This companion piece to our article further explores the user-facing innovations that come with Xinghan.
With Xinghan, Dahua shows to the world what next-generation AI can do, and again proves itself as a leader in advanced AI technologies.
TOPIC: