BigData platform
  • DAP Distributed Computing CDC CDC Customized Distributed Computing Customized Distributed Computing for Applications Unstructured Data Processing Supports MapReduce/Spark Offline Data Mapping Simplification Modeling for Different Applications
  • Distributed Parallel Database CDPD deploys physical structure across regions Key Technologies WAN-based cross-region deployment Global data table space Data local storage access, no cross-node aggregation synchronization Global metadata consistency SQL request task scheduling distribution Data parallel
  • Distributed Data Storage CNHC: Hadoop-based NFS Storage CeresData NFS Hadoop Connector allows Hadoop to run a single copy of the data on NFS storage: high reliability, low-cost read performance: single-node performance increased by 3 times to support data out-of-order read and write queries performance
  • Data preprocessing (data cleaning) CDPP CDPP (Ceresdata Data PreProcessing) Data cleaning concept External data source data content has dirty data, that is, data has vacancies, noise and other defects. Dirty data will distort the information obtained from the data, affecting data mining