Integrating Digital Twin and Artificial Intelligence technologies is reshaping manufacturing monitoring systems by leveraging synthetic data and advanced computer vision models. This paper presents an approach where a Digital Twin of a factory is used to generate synthetic datasets to train Vision Transformers for object detection and image segmentation in manufacturing processes. The study demonstrates improved accuracy in detecting and monitoring factory assets, validated through synthetic and real-world datasets. An industrial case study further illustrates its potential to identify anomalies.